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ABSTRACT 

Methods are presented (1) to partition or decompose a visual 
scene Into the bodies forming it| (2) to position these bodies in 
three-dimensional space, by combining two scenes that make a 
stereoscopic pairj (3) to find the regions or zones of a visual 
scene that belong to its background} (4) to carry out the Isolation 
of objects in (1) when the input has Inaccuracies. »™«»<Ti g computer 
programs implement the methods, and many examples Illustrate their 
behavior. The input is a two-dimensional line-drawing of the scene, 
assumed to contain three-dimensional bodies possessing flat faces 
(polyhedra); some of them may be partially occluded. Suggestions 
are made for extending the work to curved objects. Some comparisons 
are made with human visual perception. 

The main conclusion is that it is possible to separate a picture 
or scene into the constituent objects exclusively on the basis of 
monocular geometric properties (on the basis of pure form); in fact, 
successful methods are shown. 

Thesis Supervisor: Marvin L. Minsky. 
Title: Professor of Electrical Engineering. 
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If the machine is asked to separate the bodies, it must say 

(BODIES ARE AS FOLLOWS : (18 9) (2 7) (3 5 6) (10 15) 

(4 13 14) ) 

If asked to report the triangular prisms, it should answer 

(10 15 IS A TRIANGULAR PRISM) 

■■ This thesis discusses the problems Involved in this task. 

What should be done when the information is noisy, some lines 
are missing, etc? 

Bow can the computer separate the background from the objects 
forming the scene? 

How should shadows be handled? 

How can stereoscopic vision be used? 

What about ambiguities and optical illusions? 

=- This thesis also discusses some related aspects of human 
visual perception 

«- Key words and phrases related to this study are as follows : 



artificial intelligence 

body 

background 

background discrimination 

classification of images 

CONVERT 

cybernetics 

feature recognition 

geometric objects 

geometric processing 

graphic processing 

graphical communication 

graphical data 

heuristic procedures 

heuristic programming 

identification 

image 

intelligence 

line drawing 

LISP 

list processing 

machine aided cognition 

machine perception 

mechanisation of visual 

perception 
object identification 
optical 

optical illusion 
pattern 



pattern matching 
pattern recognition 

photo- inte r pr etation 

picture 

picture abstraction 

picture processing 

picture trans* oimat ions 

pictorial structures 

polyheara 

recognition 

robot 

scene 

scene analysis 

solids 

stereoscopic 

symbol manipulation 

three-dimensional 

three-dimensional scenes 

three-dimensional solids 

two-dimensional patterns 

vision 

visual 

visual information processing 

visual object recognition 

visual perception 

visual scenes 
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Computer Review (A. C. M.) index numbers: C.R. 3.61, 3.63, 
4.22, 5.20. 



Why this work was chosen as a thesis topic m . 

The present work was 

carried out using the facilities of the Artificial Intelligence Group 

of Project MAC, at M. I. T. Currently, the main goal of the 

Artificial Intelligence Group (AI group) is «to extend the way 

computers can interact with the real world: specifically to develop 

better sensory and motor equipment, and programs to control them.^> 

{Minsky, Status Report II}. From such efforts, a robot or mechanical 

manipulator has been constructed, consisting of a PDP-6 computer, 

an image dissector camera mechanical arm and hand (see pictures). 




IMAGE DISSECTOR CAMERA 



<£These "eyes and hands" are eventually to be able to do reasonably 
intelligent things but first, of course, it is difficult enough to 
get them to do things that are easy for people to do.»{Ibid.} 



An image dissector 
silently watches 
a triangular prism 
in the vision labo 
ratory of the A.I. 
Group . 
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The work was naturally divided into visual information processing 
(computer vision) and manipulation and control of the arm-hand . 
Thus, when I came as a graduate student from the Politeenico de Mexico 
to M. I. T. (Sept. 65) and became associated with the AI Group, I 
found a great interest there in graphical communication with computes. 
Moreover, it was felt that symbol manipulation techniques would be 
relevant to this area. I was fortunate enough to have had some con- 
tact with the LISP language in some of its implementations: 
MB - LISP {Mcintosh 1963) * and Hawkinson-Yates- LISP {Hawkinson 64}* 
at the Centro Nacional de Calculo of the Politecnico; in fact, I 
became interested in the area because I felt that it would be possible 
to handle two-dimensional structures much in the same fashion as one 
handles lists (that is, one-dimensional structures or strings of 
symbols) in a pattern-driven language, such as CONVERT {1965}, recently 
finished at that time. 

The area also offered a good opportunity to understand and 
evaluate several techniques, computers, equipment, etc. Consequently 
I decided to work in it. 



(*^ The parentheses { } always indicate a reference to the 

bibliography at the end of this thesis, where the complete title, 
date, etc., of the paper can be found. 



13 



nffF 



~^»^SiS*'!)S* s 3as';^^i??g*i*?<;-.s**y^s%?!? 



^■^M^^^:- >if?^^^?^ 



SIMPLIFIED VIEW OF SCENE ANALYSIS 



TO THE BUSY READER 



This section presents a general view of the problems 
in the thesis and their solutions; if you are short of time, 

(1) Read the abstract and this section. 

(2) Choose some scenes from section 'Analysis of many scenes', 
and observe how the computer perceives them. 

(3) Look through the table of contents, select additional topics. 



Scene Analysis 

■ ' " t ; ■ Scene analysis is the result of interaction between 

optical data coming from the Eye, and knowledge about the visual world 

stored in the programs. In all that follows, the optical data entering 

through the Eye is reduced to a line drawing; this pass is called 

pre-processing , and it will be only briefly sketched here. 

After preprocessing, such a 



The stylised presentation that 
follows is only an example; in 
particular, scene analysis does 
not need to follow the sequence 
pre-processing -*> recognition. 

See 'Division of work in 
Computer Vision' in page &o . 



line drawing is analyzed in order 
to discover and recognize given 
objects in it. The process is 
called recognition . 

This thesis is concerned 
with recognition. 

We now give a simplified exposition of both processes. Recognition 
will be discussed abundantly in the remainder of this thesis, since 
it is the main topic; readers who wish for more information on pre- 
processing or other approaches should consult the references, for 
instance {my MS Thesis} and {A C Shaw FJCC 68}. See also page feo . 



14 




Each inhomogeneous square [Jj is divided in four pfi , ignori 
again the homogeneous sub-squares. 



ng 




The process is repeated a few times more. 
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The squares are now reduced to lines and vertices. 
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The resulting analysis gives us the first chance to start 
working abstractly now, instead of continuing in "picture-point 
space." Preprocessing is finished. 




Recognition 



This and the next page 
describe proposed, but still 
unfinished, parts of the 
system. 



What follows is merely a brief summary of the processes in 
recognition. A more systematic presentation and classification of 
processes in recognition is found in 'Division of work in Computer 
Vision' , on page 60. 

A program would check in the original scene, on both sides of 
each line, for continuation across the line, of textures, local cracks, 
etc. On these and other grounds, shadows would be picked up and 
erased: 
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A line-proposer program studies the abstract or "symbolic" scene and, 
using some heuristics and general principles, proposes places where 
it is quite probable that a line is missing: 




These places are searched by a line-verifying, program, which is an 
specially sensitive test that uses fine measurements from the ori- 
ginal scene, and often It will pick up a boundary that was missed 
in the less-intelligent homogeneity phase. Here it can be practical 
to apply a very strict and sensitive test, because the program 
knows very accurately where the line should be, if it really exists 
at all. For example, even if the two faces have almost equal illu- 
mination the Eye can pick up a thin, faint highlight from the edge 
of the cube. It would have been hopelessly expensive to look for 
such detailed phenomena over the whole picture at the start. 
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At this stage our program SEE (page 58) comes 
into action. This program treats different kinds of local 
configurations as providing different degrees of evidence 
for 'linking' the faces. This evidence is obtained mainly 
at vertices, and at boundaries between regions. 

A vertex is in general a point of intersection of 
two or more boundaries of regions. These regions might or 
might not be faces of a single body. SEE examines the 
configuration of lines meeting at the vertex to obtain 
evidence relevant to whether the regions involved belong 
to some object. 

For instance, in the vertex configurations "ARROW" and 
"FORK"(a complete classification of vertices can be found 
below in table 'VERTICES'), 

b 

a 





"FORK" 



"ARROW" 
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the "fork" suggests linking face a to face b, b to c, c to a. 
The "ARROW" links a with b. A "leg" (which depends on nearly 
parallel lines) would add a weak link, in addition to the ordinary 





'LEG 1 
(Weak link shown dotted) 



Matching T's. 
(two strong links) 



(or strong) link placed by its 'arrow' 5 a "T" looks for a matching 
"T", and if found, two strong links are placed as shown. Also, a 
"T" counts against (inhibiting, that is) linking a with c, or 
b with c. i^V / ° 

These links, for our example, are 




and may be represented as 




[weak links are dotted] 
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indicating two groups of linked faces, that is, two bodies: 



(BODY 1. IS 1 2 4) 
(BOOT 2. IS 3 5 6) 

If in addition we give at this point to 
the computer the definition or concept 
of a * triangular prism' , through an ab- 
stract model of it {my MS Thesis}, we 
can get 

(12 4 IS A TRIAHGULAR-PRISM) 
(3 5 6 IS A CUBE) 



Recognition has finis hed . 



Analysis of several examples 

A larger variety of kinds of evidence is used in more complicated 
scenes, making the program more intelligent in its answers: 

(1) The links themselves are inhibited by conditions or configurations 
at the neighbor vertices and faces; for instance, in the case 
of a "FORK", the (strong) links indicated below are inhibited: 



(2) 



(3) 






The links to the background are ignored [complete descriptions 
of conditions for producing and cancelling links are to be 
found in section 'SEE, a program that finds bodies in a scene']. 

A hierarchical scheme is used that first finds subsets of faces 
that are very tightly linked (e. g., by two or more links). 
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These "nuclei" then compete for more loosely linked faces 
(faces linked through one weak link and one strong link & "O » 

or one face completely unlinked, except by one strong link O)* 

By not considering a single link, weak or strong, as enough 
evidence for assigning two faces as part of the same object, this 
algorithm requires two "mistakes" (that is, two careless place- 
ments of links between region* that should not be considered as 
forming the same body) to make an Identification error. 



The bodies of the following scenes are found by SEE without 
difficulty. 




Hote that of the strong links available to the "FORK" marked with 
an arrow, two were prohibited or inhibited and only one is produced 
by SEE. 
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Dotted links are weak. 
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In the following figure, the "FORK" of the big object is missing. 





Statement of Rules We wU1 re _ 8tate the rnlea under (3) o£ page *z . 

Region (definition). Surface bounded by simply closed curves. 

We will consider the outer background (:16 in fig 'L10' , page 59 )' 

to be also a region. 

Nucleus (definition). A nucleus (of a body) is a set of regions. 

Linked nuclei (definition). Two nuclei a and B are linked if 

regions a and b are linked where a € A and b e B. 

First rule ; If two nuclei are linked by two or more strong links, 

they are merged Into a larger nucleus. 
For instsnc. , regions :8 and :11 are put together, because there 

exist two strong links among then, to form the nucleus :8-ll. 

Maximal n uclei : Starting from nuclei containing individual regions, 
we let tho nuclei grow and mergft under the First rule, until no new 
nuclei can be formed. When this is the case, the scene has been 
partitioned into several "maximal" nuclei; between any two of these 
there is at most one strong link. 

For instance, regions :8 and :11 are put together by the First 
rule; now we see that region :4 has two links with nucleus :8-ll, 
and therefore the new nucleus : 8-11-14 is formed. This last is a 
maximal nucleus. 





= For the moment, ignore the colons (:) in front of numbers. The 
name of a region is a number preceded by a colon, such as: 16. 
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The First rule is applied again and again, until all nuclei are 
maximal nuclei} then the following rule is applied: 

Second Rule : If nuclei A and B are joined by a strong and a weak 
link they are merged into a new nucleus. 




The Third rule is applied after the Second rule. 
Third Rule : If nucleus A consists of a single region, has one link 
with nucleus B and no links with any other nucleus . A and B are merged. 

(10 11) does not join the bigger nucleus because (10 11) does not 
consist of a single region. Below, 9 does not join (7 8) or (4 5) 
because 9 has two links: /t\ fA 




J I 




The Third rule tends to avoid proposing bodies consisting of a 
single region. 



The next example shows how three "false" links failed to lead 
SEE into error: 






Here three links were erroneously placed bat SEE did not get 
confused by them. 

In complicated scenes, coincidences cause two objects to line up. 
As a result, vertices of different objects are merged, two objectively 
different lines appear as one and so on. The next example Illustrates 
these phenomena and shows how SEE copes with the problem. 
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SEE trans forms the above scene as follows: 
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As we see, the nuclei are going to be correctly formed, and SEE will 
also analyze this scene correctly. 

She bodies do not need to be rectangular, prismatic, convex. They 
only need to be rectilinear. As we will see later, even curved objects 
may be identified, under certain restrictions (cf. Table 'ASSUMPTIONS'). 




Figure 'BRIDGE' 
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All the bodies in "BRIDGE" are adequately found. A new heuristic is 
used here: 




three parallel lines comprising'r'e'g'ions that are not background, and 
having the background as a neighbor, and a 'T' in the center line, 
originate a strong link, as shown above. 

The following locally ambiguous scene is correctly parsed by 
our program: 





If we add another block to the right, the program makes a mistake and 
fails to see one of the inner cubes: 





Figure 'MOMO' also gets decomposed accurately: 




Figure 'MOMO' 
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The local links allow correct identification of the following body 3 





o 



If the lateral faces do not have parallel edges, a mistake occurs 
(conservative behavior, page 2*2): 






Another mistake occurs in the following scene: 



&> 



W^ 




At left, the above mistake is not produced 
because vertex A links :2 and :8, by 
the new heuristic introduced in 'BRIDGE'. 



Conclusion 

The performance of this program shows that it is possible to 
separate a scene into the objects forming it, without needing to know 
the objects in detail; SEE does not need to know the 
'definitions' or descriptions of a pyramid, or a pentagonal prism, 
in order to isolate these objects in a scene containing them, even In 
the case where they are partially occluded. 

The program will be fully analyzed in the following pages. 
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* 
Problems in analyzing a ylsual scene 

The problem of taking a two-dimensional image (or several such 
Images), and constructing from it a three-dimensional Interpretation, 
involves many operations that have never been studied, to say nothing 
of being realized on a computer. We will list some of these here; 
a more complete list is found In my M.S. Thesis {MAC TR 37}; some 
have been side-stepped or ignored by the present recognition system; 
the problems which we did solve are discussed in the text. 

Among the facilities that must be available are: 

a) Spatial frame-of-reference : setting up a model of the relation 
between the eye(s) and the general framework of the physical task, 
i. e,, where are the background, the "table" or working surface, 
and the mechanical hand(s)? 

b) Finding visual objects , and localizing them in space with respect 
to the eye-table-background-hand model. 

c) Recognizing or describing the objects seen , regardless of their 
position, accounting for partly-hidden objects, recognizing objects 
already "known" by descriptions in memory and representing the 
three-dimensional form of new objects. 

d) Building an internal "structural model" of what has been seen, 
for the purpose of task-goal analysis. 

Among the important factors are the effects of: 

1. Both the camera's focus and its depth-of-f ocus . 

2. Illumination of the objects . Light affects the appearance of 
objects in obvious and subtle ways -- in scenes with multiple 
objects and lights we get complicated shadows, which have to 
be detected or rejected. The boundary between two faces may 
disappear if they get equal illumination from a diffuse light source. 

3. Perspective and distance effects. Even for geometric objects with 
flat surfaces, the two-dimensional projection of their surface 



* Adapted from Status Report II {Minsky 67}. See also Project MAC 
Progress Report {1967, 1968}. 
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features can take many forms, and the system has to be able to deal 
with all of them. It works both ways, of course: once identified, 
the appearance can give valuable information about the object's 
orientation, size, and even (under some conditions) its absolute 
spatial locations {Roberts 1963}. 

4. Accidental vs. essential visual features . Two objects of the same 
shape and location can have very different visual presentations 
because of their surface textures and markings. We need to 
distinguish these two-dimensional "decorations" from real three- 
dimensional spatial features. 

Other projects 



Here are the main robot groups 



at a panel discussion, 

1968 





Chairman: 

DR. BERTRAM RAPHAEL 

Stanford Research Institute 
Menlo Park, California 

problems in the 
implementation of 
intelligent robots 

This session, the second of three sessions on robotry, will 
consist of a panel discussion among technical people in- 
volved in the design and construction of mechanical de- 
vices that are capable of significant independent "intelli- 
gent" behavior, usually by means of'computer control. The 
projects represented on this panel have drawn upon state- 
of-the-art capabilities in many technologies including 
mechanical engineering, pattern recognition, heuristic pro- 
gramming, neural networks and computer systems. Thus, 
the discussion which will be conducted at a fairly technical 
level should be of interest to engineers and scientists con- 
cerned with the problems of interfacing a variety of disci- 
plines, as well as to those interested in learning about the 
nature of current embryonic "robot" systems. 
NOTE: Tickets priced at $5.00 each (including lunch) for 
the all-day tour of "live robot" installations on Wednesday, 
Dec. 11th, will be available at this session. 



fall joint 

computer 

conference 



DECEMBER 9-10-11 

san francisco 
civic center 



Panel Members 

MR. L. CHAITIN 

Artificial Intelligence Group 

Stanford Research Institute 

ROBOT STUDIES AT STANFORD RESEARCH 

INSTITUTE 

PROF. J. A. FELDMAN 

Computer Science Department 

Stanford University 

THE ROBOT PROJECT 

AT STANFORD UNIVERSITY 

DR. T.SHERIDAN 

Dept. of Mechanical Engineering 
MIT 

HUMAN CONTROL OF REMOTE COMPUTER 
MANIPULATORS 

MR. R. J. LEE 

Air Force Avionics Lab. 
Wright-Patterson AFB 
GENERAL PURPOSE MAN-LIKE ROBOTS 

PROF. S. PAPERT 

Artificial Intelligence Project 

MIT, Project MAC 

THE MIT HAND-EYE PROJECT 

MR. L. SUTRO 

Dept. Aeronautics and Astronautics 
MIT 

ROBOT DEVELOPMENT AT THE 

MIT INSTRUMENTATION LABORATORY 



32 



RELATED RESEARCH 



Previous work by the author 

CONVERT 

A programming language is described which is applicable to 

problems conveniently described by transformation rules. By 
this is meant that patterns may be prescribed, each being 
associated with a skeleton, so that a series of such pairs may 
be, searched until a pattern is found which matches an expres- 
sion to be transformed. The conditions for a match are governed 
by a code which also allows subexpressions to be identified and 
eventually substituted into the corresponding skeleton. The 
primitive patterns and primitive skeletons are described, as 
well as the principles which allow their elaboration into more 
complicated patterns and skeletons. The advantages of the 
language are that it allows one to apply transformation rules 
to lists and arrays as easily as strings, that both patterns and 
skeletons may be defined recursively, and that as a consequence 
programs, may be stated quite concisely. 

Abstract of Convert paper in Com. A. CM. 

Because it is easy to write and modify a program in Convert, 
the language has been extensely used to quickly test 'good' 
and "great" Ideas, new algorithms, etc. It is embedded in 
the LISP of the PDP-6 computer (A.I. Group), in the TJM-7094 
(Project MAC-MIT); in the CDC-3600 (Uppsala University, Sweden), 
in the SDS-940 (Univ. of California, Berkeley). A paper in the 
A. CM. and {MAC M 305} describe the language; examples of 
Simple programs written in Convert are in {MAC M 346}; a book 
article {Patterns and Skeletons in Convert} ia oriented 
toward the Lisp consumers. For our Spanish readers, two 
Bachelor's Theses {Guzman 1965) {Segovia 1967} describe the 
language and processors, And give examples. 

SCENE ANALYSIS 

(1) Polybrick *MAC M 308} {Hawaii 69} is a Convert program that 
works on a scene or picture, expressed as a line drawing, and finds 
parallelepipeds in it. 
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(2) We would like to be able to specify in some suitable notation 
models of the classes of objects we are interested in (such as 'cube', 
'triangular prism', 'chair'), and make a program look for all instan- 
ces of any given model in a given scene or figure. Two arguments 
would have to be supplied to our program: the model of the object 

we are interested in, and the scene that we want to analyze. 
Programs to do this are described in {AFC1Q>67-0133} and {MAC M 342}. 
In these early programs, partially occluded objects get Incorrectly 
Identified. These programs are also written in Convert, and work 
by transforming or compiling the model, written in a picture descrip- 
tion language, into a Convert pattern, which searches the scene for 
instances of the model. 

(3) A Master's Thesis {MAC TR 37} discusses many ways to identify 
objects of known forms. Different kinds of models and their proper- 
ties are analyzed. 

(4) It is important to be able to find the bodies that form a scene, 
without knowing their exact description or model. SIS is a program 
that works on a scene presumably composed of three-dimensional 
rectilinear objects, and analyses the scene into a composition of 
three-dimensional objects. Partially occluded objects are usually 
properly handled. This program was discussed in {MAC M 357}, 
{Guzman FJCC 68} and {Pisa 68}, and this thesis discusses a later 
version. 

(5) The present thesis goes beyond these topics to discuss also 
handling of stereo information (two views, left and right, of the 
same scene), Improvements to deal with noisy (imperfect) input, 
figure-background discrimination, and a few other subjects. 

Canaday 



Rudd H. Canaday in 1962 analysed scenes com- 
posed of two-dimensional overlapping objects, "straight- 
sided pieces of cardboard." His programbreakstheimage 
into its component parts (the pieces of cardboard), de- 
scribes each one, gives the depth of each part in the 
image (or scene), and states which parts cover which. 
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Roberts 

The problem of machine recognition of pictorial data has long been a 
challenging goal, but has seldom been attempted with anything more com- 
plex than alphabetic characters. Many people have felt that research on 
character recognition would be a first step, leading the way to a more gen- 
eral pattern recognition system. However, the multitudinous attempts at 
character recognition, including my own, have not led very far. The reason, 
I feel, is that the study of abstract, two-dimensional forms leads us away 
from, not toward, the techniques necessary for the recognition of three- 
dimensional objects. The perception of solid objects is a process which can 
be based on the properties of three-dimensional transformations and the 
laws of nature. By carefully utilizing these properties, a procedure has been 
developed which not only identifies objects, but also determines their orien- 
tation and position in space. 

Three main processes have been developed and programed in this report. 
The input process produces a line drawing from a photograph. Then the 
three-dimensional construction program produces a three-dimensional ob- 
ject list from the line drawing. When this is completed, the three-dimen- 
sional display program can produce a two-dimensional projection of the 
objects from any point of view. Of these processes, the input program is the 
most restrictive, whereas the two-dimensional to three-dimensional and 
three-dimensional to two-dimensional programs are capable of handling 
almost any array of planar-surfaced objects. {from Roberts ~\ 

Roberts in 1903 described programs that (1) con- 
vert a picture (a scene) into a line drawing and (2) pro- 
duce a three-dimensional description of the objects 
shown in the drawing in terms of models and their 
transformations. The main restriction on the lines is 
that they should be a perspective projection of the sur- 
face boundaries of a set of three-dimensional objects 
with planar surfaces. He relies on perspective and 
numerical computations, while SEE uses a heuristic and 
symbolic (i.e., non-numerical ) approach. Also, SEE 
does not need models to isolate bodies. Roberts' work is 
probably the most important and closest to ours. 



Mechanical Manipulator Groups (see also page 32, ). 



Actually, several research groups (at Massachusetts 
Institute of Technology, M at Stanford University, " 
at Stanford Research Institute u ) work actively to- 
wards the realisation of a mechanical manipulator, i.e., 
an intelligent automata who could visually perceive and 
successfully interact with its enviornment, under the 
control of a computer. Naturally, the mechanisation of 
visual perception forms part of their research, and im- 
portant work begins to emerge from them in this area. 
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THE CONCEPT OF A BODY 

In this section definitions of a body or object will be proposed. 

The criterion is that they agree in general with the common use of 
the word 'body', while at the sane tine they should lead themselves 
to implementation into a computer program. 

Introduction 

Our ultimate interest is to examine a two-dimensional scene (a 

picture, line drawing, or painting), presumably a representation 

(projection, photograph) of a three-dimensional scene (a subset of 

the "universe" or "real world") and to find in it objects or bodies 

contained in the real scene. More specifically, the aim is to find 

the two-dimensional representations (projections, photographs) of 

the different three-dimensional bodies present in the scene. 

The phrase "two-dimensional representation of a three- 
dimensional body" will be shortened to "two-dimensional 
body" or even to "body", when no confusion arises. 

That is, we have to analyse a two-dimensional scene into collections 
of two-dimensional entities (surfaces, regions, lines) , each of which 
makes "three-dimensional sense" as a two-dimensional projection 
of a three-dimensional body. 

The problem is Inharently ambiguous 

A scene can be considered as a set of surfaces (faces or regions) , 
a body belonging to that scene is then an "appropiate" subset of Shis 
collection. Therefore, the problem of finding bodies in a scans is 
equivalent to the problem of partitioning the set into appropiate 
subsets, each one of them representing or forming a body (scene "CHURCH"). 

The problem is inherently ambiguous, since different collections 
of three-dimensional bodies can produce the same 2-dim scene, therefore 
a given scene can be partitioned in many ways into bodies. 
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It is desired to make a 
"natural" partition or decompo- 
sition of the scene, natural in 
the sense that will agree with 
human opinion.* 

To define a three- 
dimensional body is no problem 
[a philosopher may disagree, 
perhaps in singular cases]: 

Three-dimensional body (definition) : 




Figure •CHURCH' 

Set of eight elements. Adequate 
subsets (bodies) are [2 4] , 
[13 5 6 7 8]. In a more com- 
plicated example, people may 
differ in their parsing of scenes. 



A connected volume limited by a 
continuous, two-sided surface composed of 
portions of planes. 
Restriction: The above definition covers only polyhedral bodies, 

that is, those having flat faces. 
Restriction: No holes. 

No-restriction: Bodies do not need to be convex. 
Roughly speaking, a three-dimensional body is something that does not 
fall apart into pieces when lifted [this may be used as an operational 
definition of a body, given a mechanical manipulator to make the neces- 
sary tests]. 

Given a three-dimensional body, we generate a two-dimensional body 
by taking a picture of it, as follows. 

TT J n-f M nM ^ional body (definition) . Figure formed by the projection of 

a three-dimensional body. General*!?, the projec- 
tions is isometric or perspective. 
Thus, this is a view in two dimensions of a .« body, from some 
particular point of view. . \\ 

Unfortunately, a two-dlaenslonal body could same is thltf my fkom 
any of several different 3-dim bodies or, what Is worse, two 3-dim bodies 
together can give rise to a single) 2-dim body. For instance, in fig. fBBNI", 
* Without such a r equir ement , the problem has a trivial solution 
(see Metatheorem in page 39)- 
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Figure 'B RUT' 
Two blocks, or a bent brick. 

this tiro-dijMialoaaL body cotild be generated by « "b«at brick' 1 or, by 
two Meek* adjacent* to each other. We are dealing with one three- 
dimensional body In the first case, *^fateQ*i&9*^z9&.*&&A 
2-di» entity (luMly, t** drawing of figure 'BBBC') Is the sane, and 
we are confronted with an tnherent aat^iiityV 

. ■■!•■' fllf -'ffflWrff: A *ore »trilctor«wnepl« is given In Fig. •SafcLIUS' , 
which could be the represent at i» «f 363 tryliWrlfcai bodies, or the 
picture of a sculpture (one body) In lelslajjaft^ '•■ -57 




Figure *SSHU3SS a 
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Such colorful contradictions point towards the need to lay down 
a more careful definition of our task. For instance, no one would think 
that figure 'CUBE 1 



Fig. 'CUBE' 

No one would think. . , 



contains three bodies. Nevertheless (see fig. 'PARALLELEPIPED' in 
next page), that could be the case. 

These two extremes are to be avoided by an appropiate definition 
of a body and the corresponding computer program. 




Le 



SSL 



scene 



That 2-dim scene in which each line is boundary of some 



region. 



A 




& 



Legal scene. Illegal. Illegal. 
See also comments to scene R3, and 'Illegal Scenes' (page 217), in 
section 'On noisy input'. 

Metatheorem .t Any legal scejMJ can a i m7a be the projection of one or 

more three-dimensional objects." 

To prove it, it suffices to note that each legal scene is composed 
of regions l£: : &;i? p& , and each of them could be interpreted as the 
basis of a pyramid, all the faces meeting at the cuspid occluded by 

the basis. ^^^^^ 

Therefore, each legal scene can be obtained by projecting or 
photographing att adequate arrangement of such pyramids. 



We can always construct a 
legal scene by photographing 
(or projecting) suitable 
3-dim polyhedra. 
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Figure 'PARALLELEPIPED' 
An Improbable decomposition of a scene. 
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Trivial partition By ^ uge of the metatheorem, we can always find a 
decomposition of a visual scene into three-dimensional bodies; we 
call this answer "trivial". Humans do not split scenes this way. 
Our program should not, either. 

But the metatheorem points out that "impossible scenes" are ne- 
ver found among the legal scenes (see section 'On Optical Illusions 1 )} 
these always have at least one interpretation. [«J »f " r '""^ f***-"! 

We are trying to give criteria for proposing bodies that will 
suit our ends, which are to define a "reasonable" or "standard" body. 
This will permit us to judge the performance of a program designed 
to find objects in a scene. 

Several criteria are possible: 

1. Roberts {1963} suggests: given several models of three-dimensional 

bodies, use Some numerical techniques, such as least squares 
fitting, to find which model fits best through a suitable 
transformation, and accept this match if the error is tolera- 
bly small. Complicated compositions of elementary bodies 
are considered. 

2. Ledley {1962} would propose: in terms of suitable primitive components 

(arcs, legs, etc.), make a syntactical analysis of the scene, 
with the help of a grammar, in such * way that the models of 
the object you want to identify are formed recursively from 
these primitive components and (perhaps) other bodies. 
Naraslmhan {1962} and Kirsch -{1964} would agree on this 
linguistical approach. A. C. Shaw {Ph. D. Thesis} assents. 

3. Guzman {1967} suggests: prepare models which specify a fixed 

topology but where other relations (length of sides, paralle- 
lism of two lines, equality of angles) are specified through 
the use of open variables (UAR variables, in CONVERT). 
Evans {1968} would agree with that. 

These approaches require the existence of a model which describes the 
object to be identified; the model specifies a particular 3-dim object 
(or a class of them). These approaches are answering more than what 
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was asked j they tell not only "yes, it Is a body", but also 
"It Is a pyramid". The current question is sore general. 
It is desired to know if something is a body, any body, 
even one which has not been seen before. 

If it were possible to implement a program to answer that question, 
then that would be a working definition of a body. SEE is a program 
which comes close to this goal, so that it could be pragmatically stated: 

2-dlm body "a la SEE" (definition). A body is each sat of regions 

recognised by the program SEE as such. 

This definition allows the following 
Criticism: A perfect way to hunt lions is to 
capture any entity E, and to call 
that a lion, by definition. 
That Is, although this definition is precise, SEE may make 
decisions "contrary to common sense"; also, for purposes of Judging 
the behavior of the program, this definition is useless, since SEE 
will be perfect 100 per cent of the time, irrespective of Its answers. 

We are, finally, tempted to conclude that * common sense '', or 
better, "human common sense" plays a role in the definition of a body, 
since what we are trying to characterise is a usual body , normal body , 
common body , else. But even people may differ in their parsings of 
scenes. We could, of course, give a scene (such as 'MOMO* in page??) 
to 100 subjects , ask them to identify the different bodies in it, and 
come up with some sort of 'average* or 'general consensus' : 

2-dlm body (statistical and human-behavioral definition). Each one of 

the subsets into which a scene is partitioned by many subjects. 
It is understood that, in this spirit, the human objects should be 
motivated to satisfy a 

Simplicity criterion : Of the several "reasonable" interpretations 

(decompositions) of a scene, the one which 
contains the smaller number of bodies is 
preferable. 
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That is, an explanation or decomposition la simpler (and preferable) 
if It can be done with fewer parts. 

Simplicity is not to be achieved at any cost, since the parsing 
of the scene has to produce 'plausible 1 bodies, since "simplicity' 4 
could be always achieved if each scene Is reported as a single, 
gigantic body, obtained perhaps from more familiar ones through liberal 
use of adhesives (cf. also Sibelius' Monument), 

The chief choices are surely: 
=■ To choose a parsing, or 
== To list many (perhaps rank-ordered) in case of ambiguity. 

If we select the first alternative, further choices are 

■■ to have a natural parsing (human) . 

•■- to have a canonical parsing, In the sense of minimising 
some variable (the minimisation of the number of bodies 
leads us to Sibelius' Monument, Its maximisation to the 
Trivial Solution of the raatatheorem [page 411). 

Othe r kinds of 2-dim data 

•■^ *■ — ■■ ■*— We have been discussing Identification of 

3-dim bodies (through their 2-dim projections) in a 2-dim scene, 
purely on the basis of geometric regions, Many other kinds of infor- 
mation could be used, such »» texture, color, and shadows. 

Nevertheless, it is interesting 
to see how far the Identification 
of bodies can go if only geometric 
properties are used. 

■mm^ Finding bodies in a 2-dim scene is a task not very precisely, 
defined, because of the ambiguities Inherent in any projection process. 
On these grounds, the concept of 'body' is best described through 
familiarity, human opinion and consensus. We are forced to this because 
any scene could be partitioned in several ways (cf. fig.. 'PARALLELEPIPED* ) 
only some of which may be considered plausible or 'sensible' (natural, 
common, standard) partitions in regard to the bodies forming it. 
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TOTAL ANALYSIS OF VERTICES 

" Here « scene is considered as formed by several regions; 
bodies are adequate collections of regions. The problem of identifying 
bodies is restated as the problem of finding whether two regions 
belong or do not belong to the same body. This question is answered 
by examining the vertices of the scene. 

It is shown that a single vertex never conveys conclusive 
evidence, so that at least a pair of vertices is required to isolate a 
body} familiar and unfamiliar configurations of objects help to under- 
stand how the vertices are to be used in this task. 

Vertices are the important feature 



All faces of polyhedra are bounded 
by edges. 

All edges terminate in vertices. 

== This thesis deals with the analysis of visual scenes composed 

mainly by three-dimensional planar objects 

These are limited by flat surfaces 





=» All these bodies share »m a common feature the edge ; place where 
two planes [faces] meet (but see page 57 ). 




— Wherever several edges or faces meet, a vertex appears . This is 
also a common feature for all the bodies. 





A body is formed by vertices with edges connecting some of these. 
When a 3-dim body is projected into a 2-dim body, its 3-dim vertices 
(which we will call genuine 3«*§im vertices) are transformed into 
genuine 2-dim vertices, known as images of the 3-dim vertices, as 
figure 'GENUINE * (in next page) indicates . 

That is, a genuine 2-dim vertex has com from a genuine 3-dim 
vertex. Some 2-dim "false" vertices appear too; they do not come 
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Two 3-dim 
bodies, one 
of them 
showing 
Its genuine 
3-dlm 
vertices . 




A 2-dim 
scene 
contai- 
ning two 
2-dlm bodies, 
one of them 
showing its 
genuine 2-dlm 
vertices. 
Three false 
vertices also 
appear. 



Figure 'GENUINE' 

A genuine vertex (such as G-,') is one whose counterimage 
(Gj in this case) belongs t some body; a false vertex 
such as F 2 ', Is a virtual intersection, and generally 
has no counterimage in the 3-dim world. See fig. 'NODES* . 
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from genuine 3-dim vertices, but rather from the partial occlusion 
of parts of opaque bodies [transparent objects give rise to different 
kind of false vertices; Guzman (MS Thesis} deals with them by using 
transparent models, and a mode of operation of TD, the recognizer, 
that re-interprets or ignores certain types of vertices. {AFCRL-67-0133}] . 
False vertices do not belong to any object. 

Genuine and false vertices _.,,.. 

— — — i — — — — ^— — The classification of vertices into 

categories "genuine" and "false"will allow isolation of objects in a 

picture; In fig. 'GENUINE', elimination of vertices F' , F ' , and F * 

12 3 

divides the genuine nodes of the network (see fig. 'NODES') into two 

non-connected components, A and Q , correctly separating the two bodies. 




Figure 'N D E S' 
False vertices arise from the intersection of two 
projected edges, one of which Is typically occluded 
in part by a face bordered by the other. Elimination 

of t&i-. : iafaw*!mwtf#,*thml ! *zK-4Umamm*-- 
the network in two separate components, which are 
the bodies sought for. 

This suggests the following 
2-dim body (first approx. to definition). Set of regions possessing 

only genuine vertices, and separated from other bodies 

by false vertices. 
In this way, the problem of identifying bodies is equivalent to the 
problem of identifying genuine vertices, segregating the false ones. 



46 



Problems to be solved ^ computation of thls equivalence Is challenged 

by several problems: 

=■> The distribution and position of bodies may be such that false 
vertices look like genuine vertices (fig. 'CAUTION'). 




Fig. 'CAUTION* 
That vertex looks genuine, but is false. 

Global information (analysis of more than one vertex) is needed 
in general to distinguish them. In other words, although false 
vertices are those which separate two bodies, and 2-dim genuine 
vertices originate from 3-dim genuine vertices, to segregate 
them requires more than the simple analysis of their shape. 

Some genuine vertices look like false vertices 



^ W»^ 



Genuine vertices of a body may not be present in the scene, or 
may be supplanted by false vertices. 




A single body may have totally disconnected sections (portions) , 




Continuation is not clear) some doubts arise if the object In 
the foreground covers one or two bodies (fig. • CONTINUATION') } 
the simplicity criterion prefers the single body interpretation. 
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Fig. •001)11*1X^110^ 
Continuation is not clear. 



In brief , difficulties are of two kinds t 

— Genuine and false vertices can not be distinguished 
locally (see Theorem below). 

= Even when they are completely classified, problem of 
fig. 'CONTINUATION 1 remains. 

The solution of these problems will have to make use of more global 
information. 



Classification of Vertices ^ ^^ >VEKCICES . in MXt page cla8sl . 

fies vertices according to their form, number of lines and angles 
among the lines. It contains the most common types; vertices having 
more edges could have been included. 

Let us consider one of these types, ARROW. Three regions called 
1, 2, and 3, form it. The standard, most common 
ARROW configuration is a body with faces 1 and 2 
seen against some other object 3. We indicate 
this by [ (1 2) (3) ]. However all other configurations are possible: 



<6 



r (i) <2) o) i 



Id 3) (2) 




1 2 3)] 
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'L'.- Vertex where two 
lines meet. 



'FORK'.- Three lines forming angles 
smaller than 180 degrees. 




'ARROW' .- Three lines meeting at 
a point, with one of 



'T'.- Three concurrent lines, two 
of them collinear. 



the angles bigger than 
180 degrees. 





' K' . - Two of the lines are 

collinear, and the other 
two fall on the same side 
of such lines. 



'X'.- Two of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 





'PEAK'.- Formed by four or more 
lines, when there is an 
angle bigger than 180°. 



'MULTI'.- 



Vertices formed by four or 
more lines, and not falling 
in any of the preceding types 



TABLE 'VERTICES' 
Classification of rectilinear vertices. 
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Thus, for an ARROW, all the groupings of its faces are possible; any 
procedure that, by looking at an Arrow tries to decide how its faces 
are grouped into bodies, will always make mistakes. 

The generalization of the above analysis to all other types of 
vertices proves the following 

"Theorem". There does not exist a set of local decision procedures 

[[Xj] , each one looking or getting Information from one vertex 
and establishing b-equivalences among some of their faces 
(two faces a and b are b-equivalent , indicated asb, if 
the n t decides that they belong to the same body; this is 
an equivalence relation) , uaing information only from that 
vertex (it does not look at the other vertices or at the values 
of the ji's at the other vertices), which will partition all 
scenes correctly. 

That is, the following machine will not work for all scenes: 



The 




*en*.T S 



decide by processing information at exactly one vertex; 
the box in the right accepts all these decisions and passes them as 
results. Ho matter what set of ^ wo choose, there exists a scene 
that Induces an incorrect partition by our machine. 
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A stronger assertion is that, in view of inherent ambiguity, 
there is not even any global procedure! £- ■? 

All the different groupings of regions of a vertex into bodies 
are possible) this is illustrated by the following complete set of 
scenes, each one of them showing a different partitioning of a type 
of vertex. These examples are useful also in giving an idea of 
unusual, as well as familiar scenes} we will have later occasion to 
use them, when searching for heuristics to form bodies. 

Generation of partitions 



conpo ( ( 1 
((1) (2)) 
((1 2)) 
2 



2) ) 



There are only two partitions of a 
set of two elements. 



2 



( 



Partitions of a set of 
elements 



compo ( (12 3) 
i ((1) (2) (3)) 

z ((1 2) (3)) 

3 ((1 3) (2)) 

4 ((1) (2 3)) 

5 (d 2 3)) 
5 



Partitions of a set of L±/ 
elements ^ 

compo ( (1 2 3 !») 

I ((1) (2) (3) U>> 

z ((1 2) (3) U>) 

3 (11 3) (2) (!»)> 

4 ((1 l»> (2) (3)) 

5 ((1.) (2 3) U)) 

6 ((1 2 3) U>> 

7 ((1 k) (2 3)) 

* ((1) (2 it) (3)) 

1 ((1 2 k) (3)) 

•o ((1 3) (2 l>)> 

ii ((1) (2) (3 !»)) 

iz ((1 2) (3 ti) 

•3 ((1 3 k) (2)) 

«f ((1) (2 3 l»)) 

•5 ((1 2 3 !»)) 



Figures in the next *»w p^et. ate 
numbered according to the numbers 
in the leftmost column in these 
tables. 
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Digression 1. An alternate approach 



Suggestion 



As an alternate approach, one could try to use the faces as a 
basis for identification. For instance, use two scenes (left image, 
right image) or pictures, localize a sharp feature in one of them 
(vertex, crack in the face, peculiar texture, etc.) and by correlation 
or some other method, find it also in the other picture. Having 
found a few points in both images in this manner, determine the plane 
of the face, in 3-dim space. When several faces are thus identified, 
we can compute, if desired, their intersection and obtain the edges 
(lines). It will generally suffice to ignore the edges and rely on 
the faces. Since it is reasonable to expect considerable difficulty 
in finding lines and in differentiating lines caused by edges from 
those caused by shadows, an approach which avoids the lines altogether 
looks promising. But in this case, in addition to requiring two 
images, several correlations are needed (if we choose this method), 
a generally time-consuming and error-prone task. 
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("ST— - 



SEE, A PROGRAM THAT FINDS BODIES IN A SCENE 



Synopsis 



How SEE works. 



Algorithms and heuristics are presented, Implemented in a 
program, that analyze a scene into a composition of three-dimensional 
objects. Only the two-dimensional representation of the three- 
dimensional scene is available as input, and is described by a 
collection of surfaces, lines and vertices. 

SEE looks for three-dimensional objects in two-dimensional scenes. 
The program does not require a pre-concelved idea of the form of the 
objects which could appear in the scenes. It is only assumed that 
they will be solid objects formed by plane surfaces. Thus, SEE can 
not find "pentagonal prisms" or "houses" in a scene, since it does 
not know what a "pentagonal prism" is; but It will usually isolate 
the pentagonal prisms (or any other regular or irregular solid) In a 
■cans, even if some of them are partially o c c l u d e d, without having 
a description of such objects. It does this by paying attention 
to configuration of surfaces and lines which would make plausible 
three-dimensional solids, and in this way 'bodies' are identified. 

The analysis that SEE makes of the different scenes generally 
agrees with human opinion, although in some ambiguous cases they 
tend to be conservative. The most interesting thing about the 
program is how well it deals with occlusions. Many examples in 
the next section 'Analysis of many scenes' illustrate the features 
and peculiarities of the program, and also illustrate the effects 
of inaccuracies introduced in the data. 
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INTRODUCTION 

Here is a program that locates objects in an optical image of a 
scene most likely composed by three-dimensional solids, perhaps 
occluding one to another, so that some of them may not be totally 
visible. We use a line drawing as our representation of the scene. 

The analysis of scene L10 (see figure 'L10' in next page) by 

our program, named SEE, produces 

(BODY 1. IS *5 <1 «4 112) 
(BODY 2. IS X6 *l5 17 in *14) 
(BODY 3. IS X6 »9 *10 »3> 

(Body «. is *z «i3> 



Division of work in computer vision 

In trying to construct a program for seeing, several approaches 
are possible; most of them require some of the following set of 
modular programs or subroutines. 

Pre-processing. Converts the image from a 2-dim array of intensities 
to a symbolic representation or ;'.internal format 1 (page 66 ), in 
terms of vertices and lines connecting them. 

Homogeneity predicates . They decide if areas of the picture are 

inhomogeneous , and hence require further analysis (page Ife). 

Color predicates . Boundaries of different color suggest lines. 

Line finder . Locates lines of points having certain property 

(such as being inhomogeneous, or having a large light intensity 

gradient) . 

Vertex finder . Concurrent lines are merged, or a vertex is created 

at their meeting point. 

Consolidator . Eliminates the false lines and finds more lines, 
incrementing in this way as much as possible the reliability of the 
system. 
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Illumination program . Discovers where the main light sources are. 

Shadows program . Detects shadows so as to eliminate them. 

Missing lines program . General shape considerations suggest places 

where faint lines can remain undetected. 
Body recognition . Partitions the scene into appropiate subsets, each 
one being a body or object. Thus, SEE is a body-recognition program. 

Object identification . These objects are compared against abstract 
descriptions (models.) of cubes, pyramids, etc., so that a classification 
is done, and a name is attached to each one. In the process, certain 
parameters may acquire values: the height of the pyramid is observed. 
Positioning . Having analyzed the scene, the relevant objects are 
positioned in three-dimensional space, and additional relations among 
them are discovered (support, obstruction, etc.). Enough information 
is obtained to allow the mechanical arm to manipulate the objects and 

achieve its goals. 

Stereo . More than one view are analyzed (pageZSi) and from them, 

3-dim spatial positions are found. 

Focussing . The computer, by adjusting the focus of its lens, 

acquires knowledge of how far the objects are. 
Feedback among these parts is more necessary as the complexity of the 
scene and of the desired goals increases. 

Recognizer . The task of body recognition and body identification was 
formerly accomplished by a single program (for instance, DT or TD {my 
MS Thesis}) that compares the symbolic description of the scene against 
the symbolic or abstract description of the model of the desired object, 
in a kind of two-dimensional matching, to Isolate instances of that 
object in the scene. 

Technical descriptions of SEE 

1. Annotated listings . Above all, the primary source of information 
is the listing of the programs, that appears complete in this thesis. 
They are written in Lisp. If, despite my efforts, some of my explanations 
are not clear, consult it: it is annotated. The programs themselves, 
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examples, test data, results, instructions, etc., are in the DEC- 
magnetic tape "GUZMAN F" at Project MAC (AI group*. Instructions 
are given in page 78. 

2. This section of the thesis contains a description and discussion 
of the different algorithms and procedures used. 

3. Publishe d papers that cover part of the material at somewhat 
less depth, and therefore are more readable, are also available 
[FJCC 68} {Pisa 68}. Except that they contain some examples not 
included here, they contain no other information not covered here. 

4. An internal report {MAC M 357} described an earlier version of SEE. 




FIGURE 'R 3' 
A scene. 



62 



INPUT FORMAT 

Eventually, several preprocessors will be able to receive data 
through an input camera and reduce It to the "internal format" of a 
scene, in the form required by SEE. For testing purposes, the scenes 
are entered by hand in a simplified format, called 'input format', 
to be described now. All the scenes analyzed by SEE have been written 
in input format . 

Example. R3 . The input format of scene R3 is 

IDEFPROP R3 (X«7) BACKGROUND) 

(NOT (SETO R3 (OUOTE ( 

XA 4.3 4.5 <X*7 XC Xt4 XC X*l XB i 

XB 4.0 5.7 (Xs7 XA X»l XD) 

XC 4. a fl. 5 (X34 XF X*2 XD X*l XA) 

XU 4.5 9.15 (X»7 XB X«l XC X*2 XE > 

XE 5.65 9.25 (X*7 XD X»2 XF > 

XF 5. 85 a. 6 (X*7 XE X*2 XC X*4 XG ) 

XG 6.6 5.2 U*7 XF X»4 XA ) R3 IN INPUT FORMAT 

XH 6.9 15.4 <X«7 XL X»J XK X»5 XI) 

XI 8.5 16.0 <X*7 XH X*5 X J ) 

XJ 11.8 12.6 (XI7 XI X»5 XK Xl6 XN) 

%< 10.0 11.9 (X*6 xJ X*5 XH X*3 XM> 

XU 7.1 13.2 [%i7 X«i X*3 XH ) 

XM 10.0 9.7 (X*7 XN X*6 XK X»3 XL) 

XN 11.65 10.3 (X*7 XJ X*6 «M > 

) > ) ) 

The first line declares : 7 to be the background.* We have to 
tell SEE which regions belong to the background. If this informatior 
is missing, a program is called that will compute the regions that 
belong to the background (see section 'Background discrimination by 
computer') prior to other calculations. 

After that, the lines associate with each vertex its 2-dim coordi- 
nates and a list (which will later be called 'KIND'), in counterclock- 
wise order, of regions and vertices radiating from that vertex. 

The function PREPARA (see listing) converts the scene as just given 
to the "internal format" form which SEE expects. It does this by putting 
many properties in the property lists of the atoms representing vertices 
and regions (property lists in Lisp get explained in next page). 



*For the moment, ignore the % signs. They are used to distinguish 
right from left scenes. 
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Property lists in Lisp * _ . 

— — — — — — Each atomic expression in Lisp has a 

property list, which is a place where facts can be stored. 

If It is desired to represent the fact that John is a 69 years 

old male, has a wife called Jacqueline, and a height of value 1.77 m, 

we could proceed in Lisp as follows: 

(1) We will agree that the atom 'JOHN' will represent our man. 

(2) In the property list of 'JOHN* we will store several properties 
or indicators and their values, using the function PUTPROP, that 
stores information in the property listj thus 

(Putprop (quote John) (quote Jacqueline) (quote Wife)) 
will add, under the indicator or property 'Wife', the value 

'Jacqueline' : 

JOHN 

I 

WIFE JACQUELINE 

(3) Hence, the representation of our facts in Lisp is 

JOHN 



SEX — MALE 
I 

AGE — 69.0 
I 
WIFE — JACQUELINE 

HEIGHT — (1,77 m) 

(4) In fact, the property list of 'JOHN', which is the CDR of 'JOHN' 
in Lisp 1.6 {MAC M 313}, is 

(SE£ MALE AGE 69.0 WIFE JACQUELINE HEIGHT (1.77 m) ...) 

(5) If later we want to know the age of John, we will ask 

(Get (quote John) (quote Age)) 
and the value will be 69.0 



This paragraph, which can be skipped if it is known what a 
property list is, will make the next section clearer. 
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INTERNAL FORMAT 



The program assumes the scene in a special symbolic format, 
which basically, is an arrangement of relations between vertices and 
regions, which are represented by atoms having adequate properties 
in their property-lists. 

A scene has a name which identifies it; this name is an atom 
whose property list contains the properties 'REGIONS', 'VERTICES', 
and 'BACKGROUND'. For example, the scene R3 (see figure R3) has the 
name *R3'. In the property list of R3 we find (see also table"K3 /*> 
INTERNAL FORMAT") 

REGIONS (XI6 X:5 X'.3 X*2 XH Xl4 X*7) 

Unordered list of regions » 

composing the scene R3. Ortkf is MtukrM- 

VERTICES <XN XM XL XK XJ XI XH XU XF XE XD XC Xb /.A) 

Unordered list of vertices 
composing the scene R3. 

BACKGROUND (Xj7) 

Unordered list of regions 
composing the background of 
scene R3. 

Region 

A region corresponds to a surface limited by simple closed curves. 

Regions are represented by atoms that start with a colon (:). For instance, 

in R3, the surface delimited by the vertices K J N M is a region, 

called :6, but D E F G A C is not. 

Each region has as name an atom which possess additional proper- 
ties describing different attibutes of the region in question. These 
are 'NEIGHBORS', 'KVERTICES', and 'FOOP'. For example, the region in 
scene R3 formed by the lines DE, EF, FC, CD has ':2* as its name. 
In the property list of :2 we find: 

NEIGHBORS (X*4 X»7 x:/ XXI) 

Counterclockwise ordered list of 
all regions which are neighbors to 
:2. For each region, this list is 
unique up to cyclic permutation. 
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KVERTICES <XF XE XD XC > 

Counterclockwise ordered list of 
all vertices which belong to 
region :2. This list is unique 
up to cyclic permutation. 

FOOP (CXX4 XF X«7 XE V.S7 XD Xxl XC ) ) 

Each sublist is a counterclockwise 
ordered list of alternating 
neighbors and kvertices of :2. 
Each sublist is unique up to cyclic 
permutation, and indicates a 
simple boundary. 

Each sublist of the FOOP property of a region is formed by a 
man who walks on its boundary always having this region to his left, 
and takes note of the regions to his right and of the vertices which 
he finds in his way. 

As other example, in the property list of :7 we find: 

NEIGHBORS (X*6 XX6 X*3 X»3 X»5 XX5 *X2 X«2 U4 XX4 
X»l XXI) 

KVERTICES (XN XM XL XH XI XJ XE XF XG XA XB XD) 

FOOP <U»6 XN X*6 XM X*3 XL XX3 XM X*5 XI X«5 

«J» <X»2 %E X12 %¥ X*4 x« X** XA X»l xB X»l *^> > 



Ver tex 

_m^> A vertex is the point where two or more lines of the scene 

meet; for instance, A, G, and K are vertices of the scene R3. Each 

vertex has as name an atom which possess additional properties des-*- 

cribing different attributes of the vertex in question. These are 

'XCOR' , 'YCOR 1 , 'NVERTICES', 'NREGIONS 1 , 'KIND', 'TYPE', and 'NEXTE' , 

For example, vertex J (see scene R3) has in its property list: 



XCOR U. 799999 

YCOR 12.600000 

NVERTICES (XI XK XN) 



x-coordinate 



y-coordinate 



Counterclockwise ordered list of 
vertices to which J is connected. 
Unique up to cyclic permutation. 
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NREGIQNS <X*7 Xl5 X*6) 

Counterclockwise ordered list of 
regions to which J is connected. 
Unique up to cyclic permutation. 

KINO (Xi7 XI Xi5 XK Xs6 XN) 

Counterclockwise ordered list of 
alternating nregions and nvertices 
of J. This list is unique up to 
cyclic permutation. 

TfPE (ARROW UK XJ XI XN X«5 X*t X*7\) 

List of two elements; the first is 
an atom indicating the type-name 
of J; the second is the datum of J. 
To be explained in next section. 

(NEXTE) Vertex J does not have the indica- 

tor NEXTE in its property list. 

The KIND property of a vertex is formed by a man who stands at 
the vertex and, while rotating counterclockwise, takes note of the 
regions and vertices which he sees. NREGIONS and NVERTICES are then 
easily derived from KIND, by taking its odd positioned elements, and 
its even positioned elements, respectively. 

NEXTE is a property that appears in certain vertices (none in 
scene R3) ; it will be explained in next section. 

The property TYPE is also put by the function PREPARA; it classi- 
fies each vertex into one of several types, as described in table 

'VERTICES' (next page). 




'L'.- Vertex where two 
lines meet. 



' FORK' . - Three lines forming angles 
smaller than 180 degrees. 




'ARROW' .- Three lines meeting at 
a point, with one of 
the angles bigger than 
180 degrees. 



'T 1 .- Three concurrent lines, two 
of them collinear. 





'K.' .- Two of the li»es are 

collinear, and the other 
two fall on the same side 
of such lines. 



^N 



1 PEAK' . - Formed by four or more 
lines, when there is an 
angle bigger than 180°. 



Two of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 




'MULTI'.- Vertices formed by four or 
more lines, and not falling 
in any of the preceding types. 



TABLE 'VERTICES' 
Classification of rectilinear vertices. 
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TYPES OF VERTICES 

The disposition, slope and number of lines which form a vertex 
are used to classified it, task performed by the function 
(TYPEGENERATOR L.) by storing in its property list its corresponding 
type. 

The TYPE of a vertex is always a list of two elements; the first 
is the type -name : one of 'L\ 'FORK 1 , 'ARROW, 'T', 'K' , 'X', 'PEAK', 
'MULTI'; the second element is the datum , which generally is a list, 
whose form varies with the type -name and contains information in a 
determined order about the vertex in question (see table 'VERTICES'). 

Vertices where two lines meet. 



Li. - A vertex formed by only two lines is always classified as of type 'L'. 

Two angles exist at it, one bigger and other smaller than 180°. The 

datum is a list of the form 

(Ej E2). where Ej is the region which contains 
the angle smaller than 180°. 

E^ is the region which contains 

the angle greater than 180°. 1 f « 

For instance, in scene R3 (see fig. 'R3'). 
G has in its property list: 

TYPE (L (%:4 %:7) ) 

The vertices of type L present in R3 
are B, E, G, I, L, N. 



Vertices where three lines meet. 




FORK. - Three lines meeting at a point and forming angles smaller than 



180° form a FORK. 
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Its datum is the vertex itself 
at which the fork occurs. For instance, 
vertex Khas in its property list 

TYPE (FORK %K) 

The vertices of type FORK present 
in R3 are C, K. 



ARROW. - Three lines meeting at a point, with one of the angles bigger 
than 180°. 
The datum of an ARROW is a list like 
(Ej E 2 E 3 E 4 E 5 E 6 E ? ) where 



E, is the vertex at the 'tail', 
E, is the vertex at the center. 
E, is the vertex at the left of E,- 




*r 



E. is the vertex at the right. 
E, is the region at the left. 
E, is the region at the right. 

E_ is the region which contains the angle bigger than 180 . 
For instance, vertex H has in its property list 

TYPE (ARROW (%K %H %L %I %:3 %:5 %:7)) 

The vertices of type ARROW present in R3 are A, D, H, J, M. 

T. - Three concurrent lines, of which two are collinear. 



-fig R3 



The datum for a T is a list of the form ( E. EL EL E. E_ E, EL ), where 



E.. is the vertex at the 'tail' of the T. 

E-, is the central vertex. 

E 3 is a vertex such that E ] E ? E, is 



an angle between 90 and 180 degrees. 

E., is a vertex such that E, E E . is 
4 12 4 

an angle smaller than 90 degrees. 
That is, E-j E 2 E . are collinear. 
Er is the region which contains the 
angle between 90 and 180 degrees. 
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E/ is the region which contains the angle smaller than 90 degrees. 

E_, is the "central "region (where the 180 angle is). 
For instance, vertex F (fig. R3) has in its property list 

TYPE (T (%C %F %G %E %:2 %:4 %:7) ) 

The vertices of type T present in R3 are F only. 
See also "Matching T 1 s or Nextes "below. 

Vertices where four lines meet. 




K.- When two of the lines are collinear, and the other two fall in the 
same side of such lines, The datum is a list of the form 

(E 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 } where 
E. is the central region. 

E-, is the region having the 180 angle. 

E, is the collinear vertex which falls 
to the left of E. E,. 

E 4 is the region to the left of E.-*-E ? 

E, is the vertex to the left of E 1 -^E, 

E/ is the collinear vertex which falls to the right. 

E 7 is the region to the right of E 1 ~ *-E ? . 

Eg is the other vertex to the right (of E ). "3 

R3 contains no vertices of type K. PA of figure BRIDGE is of type *K*. 
X. - When two of the lines are collinear, and the other two 
fall in opposite sides of such lines. The datum is a list of the form 
(Ej E 2 E 3 E 4 E 5 E 6 ), where 

E. is one of the collinear vertices. 

E, is the region to the left of E 1 C, 
where C is the vertex at the center. 

Ei is the region to the right of E. C, 

E . is the other collinear vertex. 

E,. is the region to the left of E . C. 

E/ is the region to the right of E . C. 
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For instance, we find in the property list of F 
(figure BRIDGE) : 

TYPE (X (QA:26 : 22 G :21 :30) ) 

The vertices of type X present in BRIDGE are F, only. 

The datum for an X may also be in the form (E 4 E g E fe Ej E 2 E^} 
Vertices of four lines which are not of type K or X are either of 
type PEAK or MULTI. 

Other types of vertices. 

PEAK. - Formed by four or more lines, when there is an angle bigger 
than 180°. 



PEAK 





MULTI 



MULTI. - Vertices formed by four or more lines, and not falling in any 
of the preceding types, belong to the type MULTI. R3 contains 
no PEAKS or MULTIS. 
The datum for vertices of type PEAK is of the form (Ej^ E 2 E 3 ), where 
E, is the region that contains the angle bigger than 180 degrees; 



is the vertex before E,, and E-j is after (in the & sense). 

The datum for vertices of type MULTI is of the form Ky where 



E. is the vertex itself. 

NEXTEs or Matching T 1 s.Two T 1 s which are collinear and facing each other 
(see figure) are called "matching T' s, and each one is the "nexte " 
of the other. The indicator "NEXTE "is placed in such vertices. 
If the region E ? of a T (see figure) is the background, that 
T can not be a matching T. 
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In the figure, E. and F, are matching T's because E,-E„ is 
colinear with F 2 ~ F i- It; is not required of E.-E, to be parallel, to 
F 3~ F 4* If several P ai rs of T's are possible, the closest is chosen: 



\ 



R 



F - Q are matching T's, 
and not P - R. 

The matching T's will get involved in the determination of places 
where a body is occluded by another object and later emerges visible 
again. 
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For two T's to be NEXTEs or matching T's, it is required that 
neither E_ nor F_ be background .T*t requirement should be extended to 
all regions between E_ and F , since a line can not go "under" the 
background region: 



c 




f ' ■.'>?►} -' TEH 



A and B can 8»« b* NEXTEs, since :11 is i *tfe 
Two straight lines always intersect Cpe«*ibly. at 
infinity) ; a way te detect these background regions 
is to write functions (subroutines) that f ind jolt if l two seajsents of 
line intersect, or if one segment intersects with a line. 





\ 



LINES AND SEGMENTS 
In the plane, two straight lines Always meet. 
Two segments, or a line and a segment, may or 
may not meet. (*.uj»nt it «. fs*** f*rtw» #f * {;>*')■ 
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FIGURE 'TOWER 
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FIGURE 'M M 0* 



77 



THE PROGRAM 



We now describe SEE, and how it achieves its goals, by discussing 
the procedures, heuristics, etc., employed and the way they work. 
We begin with several examples. 

Example A. Scene 'TOWER'. This scene (see figure 'TOWER') is 
analyzed by SEE, with the following results: 

wlsults 

(tfUUY 1. IS 12 13 II, 

(WUUY 2. IS »iS «5 14) 

(SJOiiY 3. IS 123 s 17 ) 

4. IS »6 *7 *aj 



(bObY 

(dOUY 5. 

(dOLY 6. 

(SUi/Y 7. 

(flOUY 8. is »20 M9 :2i, 



IS *10 Hi 39) 
IS »j3 :i4 H2) 
IS »1«J :22) 



Results for scene TOWER 



Example B. Scene *M0M0' . Details of the program's operation are 
given, (skip to next page, if you wish). 
TZ $L SEE 1-v 



Go to DDT and load file SEE 1 (in tape 
GUZMAN F) , a binary dump of the program 
SEE. 

Start. 

Read the file MOMO SI (in tape GUZMAN C) 
from tape drive 3. 

Convert MOMO from its Input Format form 
to Internal Format, the proper form that 
SEE expects. 

Call SEE to work on MOMO. 



$G 

(UREAD MOMO SI 3) fQ 

(PREPARA MOMO) 

(SEE (QUOTE MOMO)) 

Results appear in next page. 

Notes: tZ (control Z) is keyed by striking the Z key while holding 
down simultaneously the CONTROL key. (Memo; K^lll) 

^ denotes carriage return. 

$ denotes the character "alt. mode", (sw Jtu f«sf»«K«w i« iisti*^) 
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SEC 58 ANALYZES MOHO 

EVIDENCE 

LOCALEV1DENCE 

TRIANG 

SL08AL 

((NIC) ((»38) GOQ44 G0Q43 B0041 G0040) IIM9I G0046 G0CU5 G0«t<- 

LOCAL 

(LOCAL ASSUMES C17) <»9> SAHE BODY) 

(LOCAL ASSUMES (»9 Xl7) (*18) SAME BODr) 

((NIL) (NIL) <<*&)) (NIL) (NIL) (NIL) ( ( «"38 *37 *39> G0Q43 etc- 

LOCAL ' . 

m*3 «2 «I) G0081 GQ029 &0030 G0028) < < *32 »33 *27 *2o> GOeCt- 

LOCAL 

SMB 

RESULTS 

(BODY 1. IS <3 *2 ll> 

(BODY 2, IS 132 >33 *27 «26) 

IBOOY 3. IS >2S 131) 

(BODY 4. IS «20 «34 U9 «30 t29> RESULTS FOR MOMD 

(BODY 5, IS »3« «35) 

(BODY 6. IS <24 t5 121 14) 

(BODY 7. IS »25 123 t22) 

(BODY 8. IS S14 113 SIS) 

(BODY 9, IS *10 >16 111 (12) 

(BODY 10. IS 117 ll« 19) 

(BODY 11. IS >7 t«) 

(BODY 12. IS *38 «37 *39) 

NIL 



Most of the scenes contain several "nasty" coincidences: a vertex of 
an object lies precisely on the edge of another object; two nearly 
parallel lines are merged into a single one, etc. This has been 

done on purpose, since a non-sophisticated pre-processor will tend to 
make this kind of error - 



Example C. R3. Analysis by SEE gives 



(BODY 1. IS XI2 XII XS4) 

<B0DV 2. IS Xl6 Xl5 Xt3> RESULTS TOR 'R3' 

The 'fcaign indicates the deztral scenes (cf. p«ge233). The signs 
may be ignored* 
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The Parts of SEE Thg progMm ig straightforward; It does not call 
itself recursively; it does not do "pattern matching"; it does not do 

tree search. It is formed by several main parts, sequentially execu 
ted. They are 

LINKS FORMATION. An analysis is made of vertices, regions and asso- 
ciated information, in search of clues that indicate that two 
regions form part of the same body. If evidence exists that 
two regions in fact belong to the same body, they are linked 
or marked with a "gensym" (both receive the same new label) .* 
There are two kinds of links, called strong (global) or weak 
(local) . 

Some features of the scene will weakly suggest that a group 
of regions should be considered together, as part of the same 
body. This part of the program is that which produces the 
'local' links or evidences. 

NUCLEI CONSOLIDATION. The 'strong' links gathered so far are ana- 
lyzed; regions are grouped into "nuclei" of bodies, which grow 
until some conditions fail to be satisfied (a detailed explana- 
tion follows later) . 

Weak evidence is taken into account for deciding which of 
the unsatisfactory global links should be considered Satisfac- 
tory, and the corresponding nuclei of bodies are then joined to 
form a single and bigger nucleus. 

BODY RETOUCHING. If a single region does not belong to a larger 

nucleus, but is linked by one strong evidence to another region, 
it is incorporated into the nucleus of that other region. If 
necessary, more nuclei consolidation could be done after this 
step .. 

A last attempt is done to associate the remaining single 
regions to other bodies . 

The regions belonging to the background are screened out, and the 

results are printed. 

— ^ ■ ■ 

* In LISP, a "gensym "(generated symbol) is a new Atomic symbol, 
previously unused. 
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Auxili ary Routines 

Three functions are used constantly, and *ill be described now. 

THROuuhcb ^ tf-ji^ju^ a c hain of t*s." Allows properties or configu- 
rations to extend along straight lines; for instance., the property 
<<'A' has as neighbor an L, » >■'.' "7 can be extended so as 
to say^throughtes, 'A* has as neighbor an l_». 



f < f j 



* \ / / h 

schematically represented as ^— ^ V »" ■» 

Strict definition. — 4 i is defineia a« one of 



(1) • (meaning the two vertices in both sides of -f J- are in 

fact the same) . 

(2) * \ ^ V V # " ' 

matching t's 

(3) . \ ft -^^ 

(4) . ^Z _^^~ 

Example a v , \ ■ See also annotations on listing, 

of -»-• « \/ / A * 

flOOM If a vertex V is considered a "good V\ (GOOOT V) is TRtjE; 
false otherwise. 

(GOOOT V) - F if V is not a "T" 

F if ■ *0 $&&6M$r 

X if V has a OTKTB. 
F if v * 

aaralle 



F if 



parallel 



T otherwise. 
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As we stf.tfcsf miction trto to distinguish between T's originated by occlu- 
sion, such as 0, and T's originated by accident (A). 

A 



y/~ X S O 

1 1 



NOSABO „„ L , „ 
■ — Not same body." Acts as a link inhibitor. 

If consulted, (NOSABO .. V ..) will inhibit, in the following condi- 
tions, the link that vertex V may have created: 



(1) 



V-inhll 



J 



inhibited link (prohibited, ignored, forbidden, not 

created) 




ARCOW 



( 4) 4—fh 



^ 



PfAK 



(5) 



'-H 



±Z. 



Nosabo tries to find conditions indicating that two regions should 
not be considered as part of the same body; hence, if consulted, 
Nosab© may forbid a link among them. Some heuristics place links 
without asking Nosabo' s approval and Nosabo can not "erase "a link 
placed without its authorisation. 

If none of conditions (I) to (5) is met, Nosabo will be False, 
indicating no inhibition was found, and it is up to the program that 
asked Nosabo 's opinion to lay or fail to lay the link in question. 
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We proceed now to explain in considerable detail each of the parts 
of SEE. This will help the reader to understand the behavior of 
the program, its strengths and deficiencies. 



LINK FORMATION 

Several subroutines are devoted to creating weak and strong 
links. See also Listing. 



F LEA S Removes several unwanted properties. 



EVERTICES, Each vertex ls cons idered under the following rules: 

L ._ No evidence is cruAti directly by this type of vertex. 

I Nevertheless, the "L" is used in many combinations 
with other vertices to account for evidence. As we 
saw, Nosabo uses L's. "Legs" will use them, too. 
vnvv _ == No link iscr«rt«J if any of the three regions is 

r background (but see below) . 
Example (unless otherwise indicated, all examples 
are from figure 'BRIDGE' page 94) : Vertex J 
does not generate links. 
» Otherwise, three links are creaW as shown, except 
that each one may be inhibited by Nosabo. 
Example. Vertex JB only produces link :5-:8. 
Link :5-:9 is inhibited because S is a 'T'; Nosabo 
also forbids link :8-:9 because KB is an 'arrow'. 
This last rule is the most powerful of the heuristics. 
== Two links aretreM as shown, without asking Nosabo, 




if the fork is connected to the central line of 
an arrow. (H° I 1 ™ P P* here '/^^ ) 
Example: In fig. R19, PA generates links :29-:17 
and :35-:17. 
TKisiast heuristic is of help where there are concave objects (Fig. R19). 



83 



ARROW. - 



X.- 







* Link if an L is connected to its central line, 

and the region shaded contains only that arrow 

as a "proper-arrow," and no Forks. 

Region :1 contains arrow A m _2 ^S :J 
as a "proper-arrow"; also / 

region :2, but not region :3. Capisce? 

Example. RB links MO with :4. 

Allows "lateral faces" of legs to be properly 

identified and agglutinated. 

Otherwise, link except if inhibited by Nosabo. 

Example. D lays a link between : 26 and :23. 

Powerful and general heuristic. 

No link if the X comes from the intersection 
of two lines. 

Otherwise, link as shown except if Nosabo disagrees. 
Example. G originate* links :26-:22 and : 21-^:30; 
thi» last one will later be erased or disregarded, 
since :30 is the background. 

No link. 



PEAK. 




MULTI.- % / 



Link* are established between contiguous regions, 

exeept those to ithe region containing the angle 

bigger than 180 °. These links are subject to 

Nosabo inhibition. 

Example, In fig;. 'CORN', JJ generates links 

:8-:9 and :9<-:10. 

Of certain use, specially with pyramids and 

"pointy" objects. 

No link. 

The reason is: 

(1) if the vertex i«s "genuine" (cf . -f*y*VV), 
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T.- 




V 



although it gaiiiates no links, the object 
having it will probably possess many 
other vertices, through which links 
will get established, and 
(2) if the vertex if "false" because is the 
result of the casual coincidence of two 
or more genuine vertices, mistakes are 
avoided by abstaining of qeiwatin* iinks. 
This is generally the case. 
An improvement is possi- | SUGGEST ION | 
ble, by allowing MULTI 
vertices to place links. 
If matching T's, link as shown, without consulting 
Nosabo, Avoid linking to the background. 
Each pair of matching T's produces these links 
only once; that is, we do not produce two links 
while analyzing A and another two at B. 
Do not link if the middle region of a 'T' is the 
background. 

What we are trying to do here is to find places 
where a body appears as two disconnected parts. 




'■***&«%; 




Link (without Nosabo 's consent) as shown if the 
central segment of the 'T' separates two non- 
background regions, and these have the background 
as neighbor, and part of the separations between 
background and no-background are parallel to the 
central segment of the ' T' . 

Avoid double links in the following case (link 
just once) : 
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•'- :$«)*$f#$~> '.*"•. 




/-^ 



tit b*CKW»*J 



«/. hui3r»«irf 



»?#£■ 



\ 



Example. TA links :21 with :27 (F-G, 
RA-TA and JA-IA are parallel) . 
Favors occluded bodies with parallel faces. 
— Also, see "STUDY" in listing, still an 
experimental feature. 
Two linlca are placed as shown (without asking 

Nosabo) if the central line of the T is 
connected to the central line of an arrow. 
It is of help where there are concave objects. 

Table 'Global Evidence* shows compactly the main rules just discussed. 



LOCALEVIDENCE 

— ^^— Weak or local links are laid here; they are used to 

indicate, in a feebler way, that two faces or regions may be part of 

the same object. 

Nosabo can not inhibit local links. 

L 



LEG.- 




|^P?*-\ 



A weak link is placed as shown (dotted) if, 
Throughtes, an L is connected to an Arrow, 
and the two indicated edges are parallel. 

We call this configuration 'Leg 1 . 
Example (all examples from figure 'BRIDGE', 
except If counterindicated) . Vertex FA is 
a Leg (FA - QB is parallel to EA - DA) 
that links weakly :18 with :19. 

In a Leg, if there are two matching T's as 
shown, a weak link is placed correspondingly. 
Example. In fig. 'TRIAL* (page 88 ), a weak 
link or evidence is placed between :7 and :4, 
because EE is a Leg, and L and E are nwfefciNj T's. 
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%mMm < m9MMmM^m^Kmmi!M»mM)mummm 



The heuristics described will sometimes produce a "wrong linkage, " 
linking two regions that do not belong to the same body. These mistakes 
are not likely to confuse SEE, since the handling of these links (and 
all of SEE, in general) is done under the assumption or knowledge that 
the information is noisy and somewhat unreliable. 

Strong links are shown dotted; weak links are not shown. 




(A) 





:1 




•*.. 



(D) 





(E) 







(G) 



(H) 



(I) 



TABLE 'GLOBAL EVIDENCE' 
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TRIANGLE.- = A Triangle Is a 3-vertex region, of which 

two are interconnected T's, the type of the 

other vertex being Irrelevant. 

Two triangles are weakly linked if they are 

(1) facing each other, and 

(2) "properly contained", meaning that D has 
to fall on the same side of AB as C does, 
and similarly for the other vertices, and 

(3) AB is parallel to EF, and AC to DE. 
The heuristic helps with faces of a prism 
that is badly obscured. It does not help 
much, since it gives only a weak link. On 
the other hand , this weakness prevents mis- 
takes when the two triangles are not from 
the same body. I i 

y I suggestion] 

A possible improvement ^""— "~ mmm ^ m ^ 

consists of choosing the closest of two 

triangles, if several candidates are possible. 

Example. In figure 'WRIST' (page 156), weak 

links are placed between 

triangles 5 and 6, and 

between 1 and 2. P^v 11 

Example. Figure 'TRIAL' receives the 

following strong links (full lines) and ** 

weak links (dotted lines) 



FIGURE TRIAL 

The program analyzes this scene and finds 3 bodies: 

(BODY 118:6:2:1) 
(BODY 2 18:11:12:10) 
(BODY 3 18:4 =9:5:7:3:8:13) 
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FIGURE 'TRIAL* 



89 



The links could be represented as 




\ ® 




Figure 'TRIAL - LINKS' 

Strong (solid) and weak (broken lines) 
links of figure 'TRIAL'. 

SEE prints these links in the following way: (cf. also p. 110): 

* ^ ^ , — :11 has four links emanating from itself. 

((NIL) ( { s 1 1 ) G0014 CQ013 G0011 G0010) ( 
(*12) G00J5 G0014 G0G13 G0D12) (CIS) 60 
021) (<X9) G0022 G0021 G0020 G0019 G0017 

GQ016) (ClO) 6Q015 G0012 GOOtl G0010) 
((S3) G0034 SQ025 G0C24) IC4) 'G0U33 GOO 
32 G0026 o;u'j?S G0C23) <C6) ii0031 GO03O 

G0029 G0027J IdSl GG026 GuOi-D RC022 GOO .. . . ' ,i TDTAT ' 

16 G0017) IC7) UUJ3C i>C032 (,0019 G0016 btron 8 LlnKs ot iR1AL 
G0016) IIS6) bUJ0<i i»liJ24 60020) ((12) GO 
035 S0C31 GC029 G0028) (<J14>) MM) GOO 
35 50030 G0026 G0Q27) i 



Weak links of scene 'TRIAL' are 



(1*2 :i> lib t2) C6 si) C4 J5) (:9 *5) 
<*13 :9) i «3 :8) C9 i&) C4 '•? ) C9 :7 
) ( *12 »10) ( :il :i2> ) 

^-There is a weak link between :12 and jlO 



Weak links of 'TRIAL'. 
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The next step is to gather all this evidence and to form tentative 
hypotheses of objects as assemblages of faces with many links among 
them. 

NUCLEI CONSOLIDATION 

All the links to the background are deleted, since it can not 
be part of any body. 

Strong and weak links exist among the different regions of a 
scene. They are consolidated in that order by two subroutines, 
Global and Local. 



Groups of faces with an abundance of strong links among them 



GLOBAL 

are first found; these "nuclei" will later compete for other faces 

more loosely linked. 

Definition : a nucleus (of a body) is either a region or a set of 

regions that has been formed by the following rule. 

Rule: If two nuclei are connected by two or more strong links, 
they are merged into a larger nucleus. 

More detailed rules appear in page ZS , in section 'Simplified 
view of Scene Analysis'. 

For instance, in the figure below, regions :1 and :2 are put 

r 1 

,3 

Fig. 'CONSOLIDATION' 
Two links between two nuclei merge them. 

together, because there exist two links among them, to form nucleus 
:l-2. Now we see that region :3 has two links with this nucleus :l-2, 
and therefore the new nucleus :l-2-3 is formed. 

We let the nuclei grow and merge under the former rule, until 
no new nuclei can be formed. 
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When this is the case, the scene has been partitioned into 
several "maximal" nuclei; between any two of these there is at most 
one link. For example, figure 'TRIAL-LINKS' will be transformed into 
figure 'TRIAL-NUCLEI'. 



6 





Figure 'TRIAL - NUCLEI' 
Maximal nuclei of scene TRIAL. 



LOCAL 



If some strong link joining two "maximal" nuclei is also 
reinforced by a weak link, these nuclei are merged. 

The weak links of figure TRIAL are shown as dotted lines in 
figure 'TRIAL-LINKS' (page 90); they transform figure 'TRIAL-NUCLEI' 
into figure 'TRIAL-FINAL'. 
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Figure 'TRIAL - FINAL' 
Nuclei of scene TRIAL after merging 
suggested by local links. 



BODY RETOUCHING 

Additional heuristics assign unsatisfactory faces to existing 
nuclei, or isolate them. SINGLEBO0Y and SMB are used for this task. 
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SINGLEBODY A strong u^ joining a nucleus and another nucleus composed 
by a single region is considered enough evidence to merge the nuclei in 
question If there is no other link emanating from the single region. A 
message is printed indicating these merges. 

Such rules produce no change in fig. 'TRIAL-FINAL', and there- 
fore its nuclei will be reported as bodies. 

A more complex example shows the retouching operation. Figure 
'BRIDGE' undergoes these transformations: 



Scene BRIDGE 



Fig. BRIDGE 



w 
w 

CO 

X 
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> q w 

W(JQ 

Weak and strong links among regions 



B at 
•H B 



o oo o 

M OOiH vl/ 

OT Maximal nuclei 



3 
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3 



(2 or more strong links) 



£ B 

•H lH 

r-t 4J 

A! -I B 

at oo o 

at eot-t 

CS at *J 



O 



Maximal nuclei enlarged 
by weak link action 


Single 
region, 
single 
strong 
^llnk. 


M 
W 

i 

y w S 



Id. enlarged 
by single undisputed regions 



a 0) 




M 10 41 

DOOM 


• 


H«09 


S 


oojs ooy 


B 00 <9 


CO 


•H-HJ3 » 




co at « «-t 




c -5l\ 


' 



Id. enlarged 
by good neighbors, "goodpal". 
Final result. 



Fig. 'LINKS-BRIDGE' 



Fig. 'NUCLEI-BRIDGE* 



Fig. 'NEW-NUCLEI-BRIDGE' 



Fig. 'FINAL-BRIDGE' 



Fig. 'FINAL-BRIDGE' 
(no change in this 
case) . 
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FIGURE 'BRIDGE' 
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FIGURE 'ldks-bridge' 




in; tfetnl %Wj| 

Weak link* are not 
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We see that in figure ' NEW-NUCLEI-BRIDGE ' , nucleus :16 is merged 
by SINGLEBODY with nucleus :18-19 (see figure 'FINAL-BRIDGE'). Nucleus 
:28-29 is not joined with :26-22-23 or with :24-25-27-12-21-9. Even if 
nucleus :28-29 were composed by a single region, still will not be 
merged, since two links emerge from it: two nuclei claim its possession. 

This rule joins single regions having only one possible "owner" 
nucleus. 

— — Two systems of links are used by SEE. One consists of weak and 
strong links, produced by examining each vertex, and culminates forming 
nuclei under GLOBAL, LOCAL, etc. 

The second system constitutes a different network of links; SMB 
works in the second system. It is motivated by the desire to collect 
evidence not directly available through the vertices. It gathers 
evidence from the lines or boundaries separating two regions, in an 
effort to answer the questions Are two given neighboring regions part 
of the same object, or are not they? That is, are two contiguous regions 
"good neighbors" ("good "pals")? If they are, a special link, s-link , 
is placed, eventually forming a network independent of weak and strong 
links, that will collapse, in a somewhat peculiar way. Thus, a great 
amount of unnecessary duplication could be possible in the information 
carried by both systems of links. To reduce it, the s-links are designed 
to complement and extend, rather than to re-do, the agglutination 
produced by weak+strong links. They (the s-links) will, therefore, mainly 
study single faces not satisfactorily accounted for. 

SMB uses the predicate (GOODPAL R S) , which acquires the value T 

(true) if R and S are two contiguous "good neighbors" regions. 

To satisfy this, their common boundary must not be empty, and must 

lack L's, FORKS, ARROWS, K's, X's, PEAKs, MDLTIs. In addition: 

R \ = Not good: (GOODPAL R S) ~ F 

/ c N 



R \ / ^l\ " Not good: (GOODPAL RS)»F 

~S 



L" or (in general) vertex that makes 



7i 

/ <- "L" 



(NOSABO R S) to be true. 
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-• 0. K. otherwise: (GOODPAL R S) = T. 
In particular, 

/ *-\ 

is 0. K. if (NOSABO R S) = F. 

SMB analyzes the nuclei formed under weak+strong links that, after 
SINGLEBODY actuation, still remain formed by a single face or 
region. The steps are: 

1. A network of s-links is formed by putting a s-link between regions 

forming a nucleus all by themselves, and their goodpal neighbors. 

2. If exactly one nucleus is s -linked to one of those regions (that 

is to say, if such single -region single -nucleus has precisely 

one good-pal), the region gets absorbed by the nucleus; otherwise 

the region is reported as a body in itself (consisting of a single region) 




i V*i#« 



I I K*****»w> wiww*** »4 3 J 



© 
© 




5y does not change because :3 has two s-links. 
Note that 

a. The s-links are not used to form nuclei as the weak+strong links 

were; they only help certain isolated faces to join bigger 
structures. 

b. Two s-links between two regions have the effect of ©ne. 

Example. In figure 'HARD', regions :6 and :7 get joined by Slffi. 
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FIGU1E 'HARD' 
tkiM mmam fhovft tlM um of Stt. 
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SEE 58 ANALYZES HARD 

EVIDENCE 

LOCAUEVIDENCE 

TRIAN6 

BtOBAU 

((NIL) ((134)) (<*6)) ((136)) ((124) 60026 

0044 60043 60042) UU7) 60047 60046 60045 

0041 60039) ((121) 60050 60040 60039 60029 

0036 60036 60019) ((126) 6QQ54 60053 60037 

60055 60023 60020 60015) ((132) 60057 60056 60034 
8 60048) ((14) 60058 60048) ((-.MO) 60059 60032 60031) <d 
119) 60064 60063 60062 60061) (M20) 60064 60062 60060 60, 
130) 60056 60035 60033 60016) ( (US) "60066) ((116) 60066) 
((NIL) ((134)) ((16)) (((36)) (Nit) (NIL) (NIL) (NIL) ( ( •* 
019 60053 60036 60054 60038 60037 60019) (NIL) ( ( »24 122 
0040 60039 60029 60026 60027 60024 60022 60055 60023 6002 
) (NIL) ((15 <4) 60046 60058 60048) (NIL) ((M3 «17 *14) t 
116 *19 120) 60060 60064 60063 60061 60064 60062 60060 60 
132 131 130) 60033 50057 60034 60056 60035 60033 60016) ( 



60025 60023 60 etc. 
60044) ((I7))«t« 

60026 60027) ( 
60036) ( (127) - 

60033) 



LOCAL 

(LOCAL ASSUMES 
(LOCAL ASSUMES 
((NIL) (d34)) 
019) ((124 122 



(111) (U2) SAME BODY) 

(115) (116) SAME BODY) 

((16)) ((136)) (NIL) (NIL) ((17)) (NIL) (N 

13 123 121 128 129) 60020 60026 60025 6004* 



0055 60023 60020 60015) (HI 12 P33) 60052 60051 60017 60' 
43 60047 60046 60044 60047 60045 60043 50042) (NIL) (dl8 
110 16) 60032 50032 50065 60059 60031 60030) ((132 131 l. 
> (NIL) ((135)) (dl2 til) 60067) (NIL)) 
LOCAL 

(((112 Ml) 60067) ((116 M5) 60066) ((132 131 130) 60033 
60065 60059 60031 60030) ((*18 119 120) 60060 60064 60063 
6 60044 60047 60045 60043 60042) ((15 14) 6o0*6 60056 6l)0< 
3 121 128 129) 50020 60026 60025 60049 60041 60021 60050 < 
15) (d25 126 127) 60019 60053 60036 60054 60036 60037 601 
LOCAL 
SMB 

(SMB ASSUMES 
RESULTS 

(BODY 1. IS 112 111) 
(BODY 2. IS 116 115) 
(BODY 3. IS 132 131 130) 
(BODY 4. IS 19 HO 18) 
(BODY 5. IS 118 119 120) 
(BODY 6. IS 113 117 114) 
(BODY 7. IS 15 14) 
(BODY 8. IS 11 12 133) 
(BOOY 9. IS 124 122 13 123 
(BODY io. IS 125 126 127) 
(BODY 11, IS 17 16) 
NIL 



17 16 SAME BODY) 



RESULTS FOR HARD 



121 128 129) 
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RESULTS. After having screened out the regions that belong to the 
background, the nuclei are printed as "bodies". 

In this process, the links which may be joining some of the 
nuclei are ignored: RESULTS considers the links of figure 
'FINAL-BRIDGE', for instance, as non-existent. These links 
are the result of imperfections in the heuristics, mistakes in the 
placement of links, and may point out different parsings. An 
improvement to SEE will be to try to "explain" these residual links. 

Summary gEE ufles & varlety of kllu Js of evidence to link together 
regions of a scene. The links in SEE are supposed to be general 
enough to make SEE an object-analysis system. Each link is a piece 
of evidence that suggests that two or more regions come from the 
same object, and regions that get tied together by enough evidence 
are considered as "nuclei" of possible objects. 

Examples and discussion are in next section. 
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ANALYSIS OF MANY SCENES 

Until we have an adequate analytic theory, the behavior of a 
heuristic program Is best understood with examples. There are 
several ways to go about this: 

Simple _ 

— ^^— In order to learn what a program does, simple examples, each 

one illustrating a single feature or group of features, are very 

appropiate. 

Favorable . . 

————— A shiny impression of a set of routines is obtained by 

presenting 'favorable' cases, designed to enhance the characteristics 
of the program in front of the unsophisticated observer. 

Of course, of all possible inputs, there is a subset that will 
produce outputs very pleasant in terms of speed, easiness of pro- 
gramming, generality, accuracy, or whathever other feature that sys- 
tem advertises. This subset tends to get the highlights in the 
descriptions. 

Nasty 

— Examples in which the program does particularly poorly are 

useful, if well chosen, to illustrate the weak points and pitfalls 

of the techniques used, the restrictions and constraints in the input, 

etc. They may point out improvements or extensions. 

Silly u , 

— ^^— Examples having very weak connection with the purpose or 

intention of the routines or algorithms discussed serve no useful 

end, except perhaps to point out that the maker of such examples did 

not understand the issues. For instance, one could take a box full 

of pins, drop them on the table, take their picture and ask SEE to 

work on it. 



A collection of simple, favorable, and nasty examples follows. 
They are not in that order. 

A discussion is found at the end of this section. 
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Stereo Scenes Analysis of st ereographic pictures will be found in 
the section 'Stereo Perception'. 

Finding th e background Examples where the background is not known 
in advance and has to be deduced are given in the section 'Background 
Discrimination by Computer' . 



105 



LIST OP SCENES ANALYZED BY SEE IN THIS SECTION 

PAGE 



Name. 


Comments . 


Scene (: 


R17 


107 


108 


L3 


110 


111 


R3 


113 


114 


SPREAD 


116 


117 


STACK 


119 


120 


STACK* 


119 


121 


L10 


123 


124 


RIO 


126 


127 


TOWER 


129 


130 


REWOT 


132 


133 


WRIST* 


135 


136 


L2 


138 


141 


R2 


138 


139 


L19 


143 


144 


R19 


146 


147 


CORN 


149 


150 


L9 


152 


153 


R9 


156 


158 


R9T 


156 


159 


TRIAL 


161 


162 


ARCH 


164 


165 


HARD 


167 


168 


L4 


170 


171 


R4 


173 


174 


HOMO 


176 


177 


BRIDGE 


179 


180 



109 

112 

115 

118 

122 

122 

125 

128 

131 

134 

137 

142 

140 

145 

148 

151 

154, 155 

157 

160 

163 

166 

169 

172 

175 

178 

181 
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Scene R17 The fchree prlam8 are £oun d. In scenes like this, the 
position of one or two vertices may alter the analysis made by SEE, 
by changing radically the slope-direction of a small segment (such 
as KL and GH, figure 'R17'), killing several T-joints and separating 
regions :l-2 from :5-6. 

Small errors in the coordinates of vertices K, L, G, H, and few 
others will drastically change the slope of segments of short length. 
This will transform G and K to be Arrows or Forks , so that G and K 
will no longer be matching T's (cf. also 'Conservatism and Tolerant' 
page 173). As a consequence, body :2-l will be disconnected from body 
:5-6. This annoying problem is not difficult to correct, at preproces 
sor level, since there is good information about the slope of the 
(long) line BN : the slope of KL has to agree with the slope of 
BH, giving a good estimate of its true shape. The | SUGGESTION] 
rule seems to be that these short segments should be 
"re-oriented" if necessary, to agree with the longer ones, which are 
more reliable. Deeper analysis is found in section 'O n Noisy Inpu t'. 

The preprocessor should consider the hypothesis | SUGGEST ION j 
that BKLN are colinear — or SEE should propose it 
for confirmation (see 'Division of Work in Computer Vision 1 , p. *> )• 

The H, signs In the prlntouts f SO me scenes, such as R17 (see 'RESULTS 
FOR R17' in page (01), a * sign appears as part of the name of every 
region and vertex; that is, *t3 instead of :3. This will be the case 
in all scenes having names starting with the letter R, differentiating 
the "right regions" from the "left regions". This will become clear 
in the section 'Stereo Perception', page i3 3 5 until then, disregard 
the fa. 
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FIGURE 'R 1 7* 

The three prisms were correctly found. 
There are several "nasty" coincidences 
in this scene, simulating the data 
that a not-too-satisfactory preprocessor 
will tend to provide. 
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"- Without difficulty, two bodies are found. Each region 
contains four strong links relating it with other regions (see 
'RESULTS FOR L3'). LOCAL is not needed to form nuclei; neither 
SINGLEBODY or SMB. 

Explanation of the printout produced by the program _ . ( , 

printout of the results appears. The format is the same for every 

scene. It starts by saying 

SEE 56 ANALYZES L3 

which identifies the name of the program (SEE) , its number (version 

number 58), and the scene to be analyzed (L3). 

EVIDENCE 
LOCALE VIDENCE 
TRIAN6 
GLOBAL 

The different sections of the program print their name, when they 

are entered. 

We then come to a list containing regions (such as :6) and 'gensyms' 

(such as G0009): 

((NIL) ((>6) 60009 60007 60005 600.04) (<«5) 60010 60006 
60007 60004) ((*4) 60010 60009 60006 60005) ((<1) 60015 
60013 60012 60011) ((*2> 60016 60014 60013 60011) 
l(»3) 60016 60015 60014 60012) ((*7)>) 



This list contains the nuclei and the links (strong links) ; the first 
nucleus that we see is (< f 6) 60009 60007 G0005 G0004) . weaning- 
that from nucleus (or region) t'6 emanate four links, namely 60009, 
60007, 60005 and 60004. We can represent this graphically t 




The total representation of the above list is thus 





We then see * LOCAL" (when this function Is entered, it prints its 
name) , then the list of nuclei again, this time shrunk somewhat by 

LOCAL,; finally, we see "RESULTS", and then 2 bodies, follo- 
wed by NIL, meaning the end of the program. (See page 112). 
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Scene R3 TwQ bodies are f oun d in this scene. Vertex T is 
classified as of type 'T' , hence only one link there exists between 

:2 and ;4. 

All scenes have regions, vertices and lines (edges) joining 
vertices and separating regions. We generally omit the names Of the 
vertices from the drawing (figure 'R3'); we are also omiting the 
coordinate axes. 

Since each region has an inside and an outside, the following 
are invalid or illegal configurations in a scene: 




A line ending nowhere s illegal. 




Ou* scenes should be such that, 
ttt disconnect a separate component 
of the graph into two components) 
ve have to remove (delete) at least 
two edges. The graph above is 
"illegal" as input to our program, 
since the criterion is not mett 
removing edge B will disconnect 
the graph (cf. page 31 ). 

incidentally, some optical 
illusions are "recognized" or rejec 
ted because they come from illegal 
scenes of the type Just described 
(cf. section 'Optical Illusions'). 

See 'Illegal scenes', page 2. 17, in section 'On noisy input. 1 
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R 3 





FIGURE 'R 3' 
A scene analyzed by the program. 
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Scene SPREAD _ , ,, .„ 

■ ■ Body :41-42 was found; also :8-18-19. In the first 

case, there was one strong link between :41 and t42, because of the 

heuristic (g) of table 'GLOBAL EVIDENCE' (page 87"), and SINGLEBODY 

completed the object. In the second case, heuristic (g) could not 

be applied, and SMB had to join :19 with :18. 

Bodies :29-30~31~32 and :25-26~27-28 are adequately found. 
Also the badly occluded long body : 10-9-11-12-3 is found. 

Body :21-6-25-20 is found as one body. An older version of 
SEE {Guzman FJCC 68} used to report two: j6-21 and :5-20. The 
change is as follows: one link is fltced between :6 and :5 because of 
the matching T's, the other link is a weak one placed because :5 and t20 
form a LEG; a weak link is also placed between :6 and :5. 

J24 gets reported isolated, instead of together with :22-23, 
because no Leg is seen; but see comment (page 3«) i n section 'Sim- 
plified View of Scene Analysis ' . 

SEE tries to find a "minimal" answer; minimal in the sense 
that it will try to explain the scene with the minimum possible num- 
ber of bodies (cf. section 'The Concept of a Body'). That is the 
reason which joined :41 and :42 in one body, instead of two, which 
is another possible correct answer. That is also true of : 19-18-8, 
interpreted as one parallelepiped with a vertical face (:19) and an 
horizontal face (sl8-8). 

The background of SPREAD is also computed (see page 226 of section 
'Background Discrimination by Computer'). 
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SPREAD 




FIGURE 'SPREAD' 

Bodies : 10-9-11-12-3 and : 6-21-5-20 are properly found. Also is 
correctly identified the body : 19-18-8, which is a parallelepiped 
with a vertical face (:19) and an horizontal face (:8-18). 
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Scenes STACK and STACK* ^ b(jth cages %u ^ bodle8 ^ accurately 

identified by our program, which is written in LISP. In both cases 
the body t4-15-l6 is found. 

These scenes show that in many instances one could drastically 
alter the position of a vertex, without modifying the output of SEE 
(compare figure 'STACK' with 'STACK*'). 

Other examples would show that the vertices of type 'L' can be 
arbitrarily displaced, so long as their type remains 'L' and other 
vertices do not change type, without detrimental effect. This dis- 
placement may possibly affect some heuristics that use concepts of 
parallelism or colinearity, but not the rules that use the shape or 
type of a vertex (cf. table ' VERTICES ', page 69) for placing and 
inhibiting links. Sead 'Misplaced vertices' in page 2 U , in section 
'On noisy input.' 
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FIGURE "STACK* 

Every body is correctly identified. Compare with scene STACK*. 
This pair of drawings illustrate the fact that it is often 
ppssible to disturb the coordinates (the position) of a vertex, 
without introducing errors in the recognition. 
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STACK 




FIGURE 'STACK*' 
Every body is correctly found. Compare with scene STACK. 
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Scene L10 Thg concave object :ll-15-14-7-6 presents no 
problem, since there are plenty of visible vertices 
(figure 'L10'), and SEE makes good use of them. 

SINGLEBODY is necessary to Join regions sl3 and 

:2. 

The bodies of a scene do not need to be 
prismatic in shape, nor convex. Their vertices could 
have errors in their two-dimensional position. Table 
'ASSUMPTIONS' (page 255) specifies the suppositions that 
our program obeys. 
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L 10 




FIGURE 'L10' 

Singlebody had to join :2 with :13. 

All four bodies were happily identified. 
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Scene RIO 



Four bodies are found by our program in RIO. 



The scene is a good example of a "noisy" scene, in which edges that 
should be straight look crooked. This is because the coordinates 
of each vertex are "imprecise"; the vertices have some error in 
their coordinates. Other scenes also show this tendency; they 
accurately represent the data analyzed by SEE (the scenes in their 
final form were drawn by program, then inked manually) , and should 
not be considered as "sloppy drawing jobs". 



(1) 
(2) 



SEE has several ways to cope with these imperfections: 

tolerant definitions of parallelism and colinearity. 

Insensitlvity of heuristics to displacements of the vertex. 
For instance, vertex V will inhibit the link that Z proposes, 
either when V is of type 'Arrow' or when It is of type 'T' 
(but not when 'Fork'): _ ^^ % 





(3) Large variations in the coordinates of a vertex are possible 
before that vertex changes type. Vertex of type 'T' are an 
exception, changing into a Fork or an Arrow by a small displa- 
cement . 




Amwr 




Fo« 




Nevertheless, it is possible to "straighten" these vertices, 
by following the suggestion in the comments to scene R17. 

The section 'On Noisy Input' deals with these matters. 
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R 10 




FIGURE 'R 10* 

The scene contains "noisy" vertices; hence, some 
edges look bent. SBE has* resources pv ©ope with these 
probleas. 

Tigures LIO and KLO fern a stereo pair. In figure 
'LIO - R10V in page-Mr, iafornetioil ffjpw both scenes 
is combined to fin* the position of these object* in 
three-diaensional space. See se£€id»> .'tStireo *e^eption*. 
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Scene TOWER There ±g nQ need t0 ^^ use of L0CAL or SINGLEBODY 
in this scene, since there are plenty of global (strong) links 
among the different regions. : 18-22 and -.17-23 get links thanks 
to the heuristic that analyzes vertex of type "X". 

There are several "false" vertices, formed by coindicences of 
edges and "genuine" vertices: the vertex common to :9, 11, 12 and 13; 
the one common to :2, 4, 5, 6. They do not cause problem, because 

(1) in the case of the vertex common to :9, 11, 12 and 13, it is of 
type "MULTI', and no link is laid. 

(2) In the case of the vertex shared by regions :2, 4, 5, and 6, 

it is an "X" that will establish one link between :4 and :5 (which 
is correct), and another between :2 and :6 (which will do no 
harm, since we need two "wrong" or misplaced links to cause a 
recognition mistake) . 
Compare with scene 'RKWOT'. 
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TOWER 




FIGURE 'TOWER' 

A "wrong" link is placed between :2 and :6, 
without serious consequences. Results for 
this scene are in "RESULTS FOR TOWER' . 
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Scene REWOT 

T "his scene (see figure 'REWOT') is the same as the 

scene TOWER (see figure 'TOWER'), but upside down. The program 
obtains identical results for both scenes (see 'Results for Tower' 
and 'Results for Rewot'), because SEE does not use information about 
a body supporting or leaning on another body. For instance, it 
was not assumed that body tl-2-3 is partially supporting (in figure 
'TOWER') body :4-5-15; clearly this assumption fails in case of 
figure 'REWOT'. But since the assumption is not followed, the pro- 
gram succeeds in both cases (gives same results). 

See table 'ASSUMPTIONS' (page 255) for suppositions that the 
program makes or presumptions that it does not need. 

The regions :16 and t24 had to be marked as part of the 
background, following standard practice (cf. 'Input Format'). 
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FIGURE 'R E W T* 

This scene is the sane as the scene TOWER, 
but with Y replaced by 100. - Y, and 
X replaced by 100. - X ! it is upside 
down. SEE still finds eight bodies. 
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SEE 56 ANALYZES RENOT 

EVIDENCE 

LOCALE V I DENCE 

TRIAN6 

6L08AL 

((NIL) (d20> 60134 60133 60132 60131) < ( *19) 60137 60135 60134 6 

ni?ii i 1 !!!' C0137 C0l3S C0133 G0132 > UM6) 60136 60136) ((«22) 

60136 60136) (|t 23 ) 60141 60139) IO10I 60196 60144 60143 60140) 

1 uo1 * 6 60143 60142) UU3) 60157 60147 60145) (( 114) 60156 

llil^*l.^ l * 6} {{,6) &01bl G01S9 60151 60150 60146) ((12) 601 
62 60161 60155 60153 60152) (<H2) G0 15« 60146 60145) ((.7) 60161 

if? ? n ?Ji 4 5' "''•' S2 U4 - eo142 :ftoim "■•» MIM t0150 «•»«•» * 

i«fi- e £i?2« COlZ9 GOl2a> <«**»>> t<«l 7 » 60141 60139) (IMS) 60163 
S 0i ?2.S5 129 ' <(,4) 60163 60l6 ° 60130 60126) MI3) 60164 60162 601 
54 60153) ((124)) ((H) 60164 60155 60154 60152)) 

« • ?i L Jt M ! Si'-Jl* 5 2I 1 * * ,(,2 ° •*»'«**» 60132 60135 60134 60131 60137 6 
*X. 0l 2? 60132 > (N,U 1 1 »•!• «22) 60136 60136 60136) (NIL) (NIL) 

5 J ?n?SSt - * ?1 fi i HlLV UU0 *" ,9r 60i4 ° 60156 60143 60144 601 
42 (.0140) l<*6 17 tft) C oi61 60150 60151 60146 60159 60150 60149) 

li J, ll 1 . 6 '!*^!! 23 U7 ' 60139 60141 60139) (NIL) ( ( M5 «5 14) 60 
\ L.*i* * U0 i 63 60160 60130 60126) (NIL) <(l24)) (<I2 13 U) 6016 
1 60152 60162 60153 60164 60155 60154 60152)) 

i!?i L J!«!5IIV', n f 20 * 19 * 24) C0i32 60135 60134 60131 60137 60135 6 
,?J 6 ?i? 2, „ ( - 18 * 22) 60136 60138 60136) (NIL) (NIL) (NIL» (IU3 
•14 *12) 60145 60157 60147 60156 60146 60145) ((MO HI *9) 6014 

?J°n«f-S , 143 C0U4 60142 60140) (|t6 17 16) 60161 60150 60151 60 
148 60159 60150 60U9) ((116)) |(I23 M7> 6 139 60U1 60139 ((*? 
™«*. ' W S 130 C0129 60163 60160 60130 60126) (C*24)) ( ( *2 S3 M) 
60161 60152 60162 60153 60164 60155 60154 60152)) 
LOCAL 

!i ( !S ! 3 .4' 60161 60162 60162 60153 60164 60155 60154 60152) ((I 
15 15 *4) 60130 60129 60163 60160 60130 60126) (d23 *17) 60139 6 
0141 60139) ((16 17 16) 60161 60150 60151 6Q146 60159 60150 60149 
> ((MO '11 19) 60140 60156 60143 60144 60142 60140) (((13 114 tl 
2) 60145 60157 60147 60156 60146 60145) <<»lfl 122) 60136 60136 60 
13b <(*20 119 121) 60132 60135 60134 60131 60137 60135 60133 601 
32 ) ) 

LOCAL 

SM6 

RESULTS 

(BODY 1. 16 t2 «3 Ml 

(BODY 2. IS »15 15 14) 

(BODY 3. IS 123 M7) 

(BODY 4. IS 16 17 16) RESULTS TOR REWOT 

(BODY 5. IS MO Ml «9) 

(BODY 6. 1$ 113 114 112) 

(BODY 7. IS MS 122) 

(BODY 6, IS 120 M9 »21> 

NIL 
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Scene WRIST* _. ., ^ . .. ,_,.. , „ , 

^^^^^^^^— The concave objects are properly Identified. W places 

a link between >23 and :4, and another between :30 and :4. CC does 
not inhibit the link between il7 and tl9 ordered by the Arrow HA, 
because NOSABO was never called, since the first rule of 'ARROW* 
(page pf ) was applied. 

The only mistake was that objects :9-7-6 and tlO-5 should be 
fused and reported as only one. There is a link between :9 and :10 
put by heuristic (g) of table 'GLOBAL EVIDENCE 1 . It is not enough. 
There is also a weak link between 'Triangles' t5 and :6. OB is not 
a 'Leg', so there is no weak link between j 10 and :5. The situation 
is as follows (see chains of links in 'RESULTS TOR WRIST*; how to 
read these chains is explained in page \\0 , 'Explanation of the print- 
out produced by the program' ) : 





tlO and :5 will get joined later by SIHGLEBODY. 

Almost the same thing occurs with J 1-2-22-21, but in this case 



vertex A produces one strong link between 22 and 21, and vertex R, by 
heuristic (g) of table 'Global Evidence', also links 22 with 21. This 
is enough. 





© 
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WRIST 




FIGURE 'WRIST*' 

Instead of one, two bodies were found in :9-7-6 and :10-5 
Insufficiency of links was the offending reason. All other 
objects were correctly found. 
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Sce nes L2 and R2 _ „ 

■ Two objects are found, as expected. 

These scenes form a stereographic pair: two pictures taken from 
the same scene from slightly different locations, mantalnlng parallel 
the optical axes of the cameras, and the same magnification. A pro- 
gram, not yet completed, Is designed with the following Ideas: 
Left and right pictures are independently processed by SEE; L2 and 
R2 In this example. The answers are 

ANALYSIS OF L2 ANALYSIS OF #2 

(BODY 1. IS :2 :4) (BODY 1. IS %:1 %:2 %:4) 

(BODY 2. IS :1 :5 :3) (BODY 2. IS %:3 %:6 X:5) 

The question is now: Is body :2-:4 the same body as Z:l-%:2-%:4, 
or is it %:3-%:6-%:5 ? It is required, after decomposition of the 
scene into bodies, to match the left bodies with the right bodies. 
If this is accomplished, one could then locate the figure in three 
dimensional space, from the two-dimensional coordinates of the figure 
in the left and right scenes. 

In this way it will be known where these objects are located in 
the "real world". 

This "matching" mentioned above is complicated as follows: 

■- It is possible that the number of objects observed in one view 
is different from the number in the other. 

— On a given object, it is possible that SEE will make a mistake 
in the left view, but not in the right view; as a consequence, 
two bodies on the left have to be matched with one on the right. 

If the two axes of the camera are on an horizontal plane, a vertex 
in the left scene and its corresponding vertex in the right scene 
(if visible) will have the same y-coordinate, such as H in L2 and 
%I in R2. Other known relations exist, derived from the relative 
position of the axes of the camera, magnification, etc. See section 
'Stereo Perception 1 . 
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R 2 




FIGURE "R 2" 
Two bricks are found . 
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L 2 




FIGURE 



'L 2' 



Even if (possibly) a face of object :4-2 is missing 
in this case SEE makes the correct identification. 
Section 'On Noisy Input' deals with imperfect 
information. 
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Scene L19 



The small triangle :15 just could not get joined with 



the remainder of the body :16-20-19, and two objects were found. 
There is a weak link between :15 and :19, but it did not help since 
there is no link between :15 and :16. What happens is that regions 
:1, :15, :13 and :22 all meet forming a vertex of type MULTI; this 
vertex should (in some future version of SEE) be split into two, sin 
ce both :1 and :37 are the background- The rule for this splitting 
seems to be . . • .•;.. 





:11 was joined with :4, but isolated from j 12-27-5. There are 
no T-joints between these two nuclei that could give 'hints' (i. e., 
links) for their unification. 

The two large concave objects were properly isolated. 

Compare with R19 and WRIST*. 
See 'Merged vertices', page 22/ in section "On noisy input.' 
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L 19 




FIGURE 'L 1 9 1 



It was easy to find : 6-7-8-9, the hexagonal prism. 
:15 was reported as a s,ingle object: a mistake. The two bis 
concave objects were appropiately identified. 
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Sc ene R19 . .. ,,„ , 

As ±11 L19, here the triangle :27 is detached from 

: 5-32-33, two bodies being reported. There is no strong link between 

:27 and :33. There is a weak link between :27 and :5, because both 

are 'triangles' facing each other, but that is not enough. A weak 

link is never enough. 

All other bodies are properly found, including : 10-16-2-3. 

Vertex RA, of course, contributes with no links. The situation 
could change if we discover that RA is a false vertex, l„,„ < ^ — I 

.. . . 3 IsuggestionI 

that is, one composed by the merge of two genuine ones.' 1 — — ■ — 

There is enough enformation, I think* »*nce i34 and i37 are bskpound, 
and this will suggest a way to "divide" vertex RA into two simpler 
ones. This idea of dividing vertices of type MOLTI into simpler 
ones should be applied with caution, since there will be genuine 
vertex of type HULTI (which should not be split). The main use of 
this technique will be for helpift* single regions to join some other 
body, a task performed now, not too satisfactorily, by SMB. 

Compare with L19 and WRIST*. 
See 'merged vertices' , page 22 1 . 
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t27 was separated ft^iS*^-* .- !&« <**»* 
ofcjecti were correctly ftJiiaa. 
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Scene CORN ^ pyramid . 8-9-10 was easily identified because a vertex 
of type PEAK produces many links. In the bottom, bodies : 1-2-3-4 and 
: 12-13-11 were separated, because the fork between :4 and :12 has the 
background as a region, and did not contribute with any links. Cer- 
tainly, this is a possible interpretation. Another interpretation is 
to regard the object : 1-2-3-4-11-12-13 as a prism with the shape 
of a "C". 

SINGLEBODY was needed to join :4 with : 2-3-1, the only link 
being placed by heuristic (g) of table ' GLOBAL EVIDENCE.' 
The program knows that :22 is the background. 
If we could see the hidden vertex KK (if it indeed exists) , 
two links would be put and we will have had one body: 
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6 



22 




FIGURE 

The pyramid at the top was identified 
properly. Two bodies were found at 
the bottom, which is a plausible 
interpretation: : 1-2-3-4 and : 11-12-13. 
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Scene L9 



Here the tolerances SINTO and COLTO that allow for 



"sloppy parallelism" have made T's out of NA and FA. Therefore, 

these vertices do not contribute any links for si. Moreover, the 

"T" PA Inhibits the link suggested by QA between :1 and :8. 

That being all, tl gets reported as a single body (see next page). 

By decreasing the tolerances, correct identification is possible 

(see the correct identification in page 155). 

See 'Tolerances in collinearity and parallelism', pagei.l£ . 
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Scenes R9 and R9T Four bodies are found inR9, five in R9T. The 
difference is that Y and JA (see figure at bottom of this page) are not 
"matching T's"in R9T. The strong links among :12, :3, :10, and :16 are: 





LINKS FOR R 9 



LINKS FOR R 9 T 



In R9, the two strong links (G0030 and G0021) between :12 and :10 
were put by the matching T's Z-EA and Y-JA; of the two strong links 
between :10 and :16, one was because DA is an arrow; the other, 
because EA is a "T" for which heuristic (g) of tab.le 'GLOBAL EVIDENCE* 
applies . 

But in scene R9T, not having Y and JA as matching T's, a link 
between :10 and :12 disappears; and also nuclei :16 and :10 can 
not be linked by heuristic (g) of table 'GLOBAL EVIDENCE*. SEE deci- 
des to report two bodies there: :3-12 and :16-10 instead of one 
as in scene R9 . 




Are Y and JA matching 
T's or not? Different 
answers produce different 
analyses of the scene. 

These scenes show that the analyses can be quite sensitive to 

the "right" definition of parallelism and colinearity. 
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R9 




FIGURE 'R 9' 

The four bodies were found. 
SINGLEBODIES was needed to join :18 
with : 6-11-1-4-2. 
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ROT 




FIGURE 'R 9 T r 

SINGLEBODIES joins : 18 with the 
other portion of that body; LOCAL 
is needed to join 16 to that 
portion, and :t 6 with : 10. 
Nevertheless, since : 12 and : 10 were 
not found to be the sane face, body 
: 16-10 is found, and: Body :12-3. 
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Scen e TRIAL Thig scene has been analyzed in gre at detail in the 
section that describes the program SEE. Its links are found in 
graphic form in figure 'TRIAL - LINKS', or in written form (lists) 
in "RESULTS FOR TRIAL". 

LOCAL had to join :13 with the remainder of that body. 
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TRIAL 




FIGURE 'TRIAL' 
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Scene ARCH S£E analyzes scene arch ( see fi gure 'ARCH') with results 
displayed in 'RESULTS FOR ARCH'. This is an scene composed of many 
degenerate views of objects. It is an ambiguous scene (see section 
on Optical Illusions), in that several good interpretations are po- 
ssible. 

The program reports :7 and :17 as one body, which could be plau 
sible. :16, :9 and :10 get reported as independent objects. In 
the scene from where this picture or line drawing was taken, :7, :17 
and :16 were the vertical face of an object. :10 was the vertical 
face of another, :9 being its horizontal (top) face. In cases like 
this, in order to choose the "right" one of several possible inter- 
pretations, more information has to be supplied to the program, such 
as lighting, textures, color, etc. 

No link was put by A between :3 and :29, or by UB between :5 and 
:19, because D and W are GOODTs . In one case, G provides with more 
links and causes : 3-8-29-31 to be reported as one body, which is 
correct; in the other case, Q can not supply any links, and that 
body is split in two: :5-4 and : 19-18. This is a mistake of GOODT, 
who accepts W as a genuine T. If this were not the case, the arrow UB 
would establish a link between :5 and ;19, avoiding the mistake. GOODT 
could stand some improvement. 

The body : 22-23 was identified correctly. 



164 



ARCH 




FIGURE "A R C H" 



Ambiguous scene that could be correctly Interpreted In 
several different manners. :7-17 was reported as a single 
body (see table 'RESULTS FOR ARCH'), and also :9. 

The body : 5-4-19-18 was split in two: :5-4 and : 19-18, 
but not : 3-8-29-31, which was counted as one body. 
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Scene HARD Thls gcene COM i 8t s of objects of the same shape, narnaly 
triangular prisms. All are correctly Identified, Including the long 
and twice occluded »3-21-22-23-24-28-29. : 1-2-33 was also found. 
LOCAL had to be used to Join »15 with :16, and also ill with tl2. 

In an older version of the program, t7 was identified as a sin- 
gle body, and :6 as another, because they have no visible "useful" 
vertices to place links {Guzman PISA 68}. How SEE joins t6 and :7, 
because both are "QOODPALs". See "Operation of the Program; SMB"(page 

99). 

These scenes are sometimes obtained from a picture, so that 
they are the result of a perspective transformation. Some other 
scenes are drawn more or less in an orthogonal or Isometric projection. 
SEE does not depend heavily in the type of projection; there are only 
a few heuristics that use notions of parallelism. 
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HARD 




FIGURE 'HARD' 

All the bodies were correctly found. 
The most difficult was :6-7, since SMB 
had to join both regions, which do 
not have "useful" visible vertices. 
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Scene L^+ 

— The body :10-9 was reported isolated from : 13-2-3, 

due to insufficiency of links. See comments to figure R17, also. 

The algorithm that localizes matching T's could stand improvement. 

It sometimes produces "bad links" such as between :4 and :13, and 

between :6 and :3, because it found two T's that looked like they 

were matching (this mistake did not happen, actually, because vertex 

R is not a T, but a fork'.), EA and R in this case. The suggestion 

in page | <] 3 will lessen, but not suppress, these "mistakes". 
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Body : 2-3-1$ :Ws ]^tt«d~ i«f*4a£«cl 
f ro* body :10-9. too fow T. joints. 
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Scene R4 The table insults TO R R4' shows what happens when the 
tolerances are too large. Five bodies are found. Vertex B Is 
considered to be a "T" , and inhibits the links suggested by the Arrows 
R and A. As a result, il gets cut off :7-9-5-10. 

The way :2 gets isolated is as follows: T and AA claim to be 
matching T's, the link suggested by U is inhibited by Z (a Corner), 
and :2 gets disconnected from :3-4. 

The correct solution is obtained after reducing the values of 
COLTO and SINTO to 0.05 and 0.005 (see listings; COLTO decides if two 
lines are colinear, SINTO if they are parallel), respectively. The 
results appear also in 'RESULTS FOR R4' , and we can see now that only 
three bodies (the correct ones) are identified. 

Suggestion Lines like the one below should be | SUGGESTION | 
"straightened" either by SEE or (better) by the preprocessor; for 
example, B K L N and D G H in figure R17. See section 'On Noisy 
Input ' . 



Conservatism and Tolerance „ . . „ . . . .. , ..u 

— «.».^— aiii^M^^^^ More strict tolerances do not make the 

program more conservative in all cases: the link in (a) fails to be 
placed if the program has too loose (large) tolerances, because A 
will be transformed into a "T" (it will be considered to be a "T"), 
losjtng the link; the link in (b) fails to be laid if the tolerances 
are too strict, because the T- joints will not be colinear. 





In (a) , links disappear if tolerances are 
too big; ln (h) , if they are too small. 
In both cases, conservative behavior (cf. 
page H4) appears. 
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PiGTOE^ »* 4* ■'- ^H ■■■> 

Either three or *iv«b*di**«e*'€o«ttivteio*«lair» s ^«'»* 1 «»« ** 
certain paraaetera. These scenes are "noisy" ta the sense that 
the coordinate* of the vertice* depart fee* their *%daal" position 
by as naefc^KjT one irmiaster, or about 1 Z of theUotal siae of 
the linage, yhich la abo^ST-one deciaeter. This errbaWts not large 
enough to affect long linasT~bufc-.it asy s ub s e ■stiallr ■CK angc the 
direction of short ' segaeats . a.: 
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Scene MOMO The IoQg body ,29-30-34-20-19 gets identified as follows t 
:29 and j30 get two links, and t30 with :19 also, so we have the 
nucleus 129-30-19. Two links (because of matching T's) join :34 with 
t20, to form nucleus :34-20. Regions :30 and «34 receive a strong 
link, by heuristic (g) of table 'GLOBAL EVIDENCE' , and :19 with :20 
by the same reason. That completes the body. 

The fork that is common to :12, 13 and 14 puts a link between 
:12 and :13, but it is not enough to cause mis-recognition. A link 
is put by that same Fork between »13 and :14, as it should be, but 
the link between :12 and :14 is inhibited by NOSABO. 

There is a program that finds regions of a scene belonging to 
the background, when not Indicated as such in the input. For MOMO, 
the results of this program appear in page !>• . 
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FIGURE 'M M 0' 
All bodies are correctly Identified. 
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Scene BRIDGE Reglon ;10 get8 a strong and a weak link with :4, and that 
is enough to Join them. The same is true for :7. 

The links of scene BRIDGE (see 'RESULTS TOR BRIDGE') are discussed 
and displayed in pages 9^-98 , figures 'LINKS-BRIDGE' (page 95 ), 
'NUCLEI- BRIDGE' (page 96 ), 'NEW-NUCLEI-BRIDGE' (page 97), and 'FINAL- 
BRIDGE' (page98). 

Because RA and SA are matching T's, two wrong links are placed l 
one between :22 and :28, and the other between s21 and :29. This is 
not enough to cause an error, because we need two mistakes (two rein- 
forcing each other), two wrong strong links, to fool the program. But 
that could happen. 

It is interesting to note the way in which the long "horizontal 
table" 125-24-21-27-9-12 was put together. To this effect, see figures 
'LINKS-BRIDGE' and 'NUCLEI-BRIDGE'. 

Vertex JB produces only one link between :5 and :8. Vertex KB in-r 
hibits the link (through NOSABO) between :8 and :9, and the link between 
:5 and :9 gets inhibited by S, because it is a T (cf. NOSABO, page 82). 

The concave object j7-6-5-4-8-10-ll gets properly identified. 
We may say that, in general, the more "crooked" or complicated an object 
is, the easier will be for SEE to isolate it, because there will be 
many vertices contributing with valuable links. 

No mistake was made by SEE on BRIDGE; its eight bodies were co- 
rrectly identified (see 'RESULTS TOR BRIDGE' , page l&< ) • 

The background of 'BRIDGE' was also correctly isolated; see that 
in pageZSO, section 'On background discrimination by computer'. 
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FIGURE 'BRIDGE' 



180 



» — on n — <» — a <* no w » n » — 

— eoaw— m wo n no 

— e» o— m«o>k>w a«o oo — 
o — « — — wwwoin » «to oo — 

— o — n o — ooo Ma wo o n 
— oow— n o — ooo ww — oo 

n on«o a a tt — a- o — wo u 

OKdQMo — — — o «• noo u 

on ooo — jonn t» — o no X 

oomoo » — wo — — jo -»• M 

o« on — zooo jhmj n Jj 

oooin n a-OIIO — — z ooo PS 

n onoo — a a zo — w n«o W 

oouono — — n _ — o — woo 

o« ooo j — no o — o o aS 

CIO — OO --NOx — OO MOO o 

on on — zwoo n«NO M n «r 

— u— ►. w «» — ao m«o w— o w 

O w w o O O — 9 OOOO — M O 

Nn — ow.o O — w — ooon — wo CO 

ww— oouoj — no on o fh 

— © oo »«*-Mn oo — — — rj 
— o— a — — z wo i«*u » w • — R 

o o — n o — — o N — o n~-t*— ZZ 

— — won — joa ooo — o--r> jg 

«*0*BH w— — M OOOO ON *H 

"KO-O- — ozwo> in O o — oo CO 

O — 9 — O— — — — M OO ONO 

Ow O o NO O — <• (f o N w a 

9 — n — — — o — mo —no n o — 

— — — — n J 9 o w o o — o » o o — « 
o oNon — — — owan oo n 

— — oomoz— a — « o o u — o 
o— oowo — — o — moo o — o 
ow 9—9 oo — o - — mo k.n«o 
9o« — — o jo M**o noo 

OWN4 00— — 9 SO— OOOO 

ooon— n n mz o«oo 099 — 

— OOK.OOOO — O O — N O O 
OMOOMOO«On w Ft w ONO 

OW- OOO OOO— O VI- N wwwo 

oon o _ a jo m— on noo 

o««oM oan — a o o — ooo — 

nao4 id n mz o — 9w ooom 

NOOOOONO- — O J 9 O 

O— OOMO O— O o — n K o — o 

oo ooo ooo — o r» z « n » a • a 

O — — O J O « — O w — O — 

- m *. a n — a « — a o o oo w » 

— — — w. O • NZ O— O — OO M 
— — wono Nwo— «» ON N O OO 

— — owo - » o n wow rv — o 

••——OOO — OO— O MOM » M w O 

— -• O —O OO MOO* -«0 

-»— — — o— o in mo oooo ua»K 

o-«« « — mo a a o»«» 

— o-* « o m «««*ao— j — ■ • o wo 
* o o-w o w n «« o o o o — N n *- o 

M O ^* a «» OOO M <0.4OW • w •• o 

o — o o— oo •» « w m ■» no 

OO— O OOr^N o — «• — OO — M 

O—NK. — «ON O — OOJM 

ww«««-^««- o — 00*4 o n — o >^o 

— «*oonn— nnoow — •• — ozo 
o — o o« am o -<» r» J o on-»o 

-• OO^OnOw Oh »IMNrt MO 

••— -uo» i><n at"" »o — • 

«-o«o a mMow o— <•-• oao« 

— •« « n — n o — — o o —J ■» — 3 — o 
oooon owoo— >■ — *» . — oo 

— OOO^OOM O N 3 O •« — * O O 3 

nooooonww Oo — ON«— N —»o 

N ojo — •»««> >- oooon o» t» 

OO— MO o«*wwoo a oo«o o •* •* 

om«m «o - o o - owoooo — »n->o 

oo««oon — — act as oo >■ woo 

0*OWO»i-l o « * • « o oo oo 

— j—ooon-«— ».m u« — » n o«»k» 

K ** OOOZJrwMW Z OOOn BO«« O 

WO OOO— — —O— < — OO « o 0« 0\M 

— n— w NO zoo— oooooo wo— "O 

— ono— » — — 03 — oo c — oo 

O •< O N O w w o • — — WO— O — « «• OO 

— ooowonN— n— — — w —• n— «■*> — » 

O O — OOOJOWO >■ >■ w 4» o «««io m ON « 

— r*. o — — ooxfter op wuwwon — on M •* — — 
on *> — oozoow ooo ooooooo««o in k 
oo»w— n — ao— oaw« oaoo -* ooo m w 

OO — — ON— — »— w— —o O wHkOOO w 

oo — wwno— o whiokww — no o 

rv o o — m — JON— rtwiow— — n« oil on •* 

-«o o — o — .w w — «« o j « w: w — M'On -o»«<< — w 

on no ——zoo— oo— o woo ooo«o w 

o o o •♦ — — — — ooz — — — — oo — ooo w 

oonoow o — — — w — w J — !f w — O O '» •>. w 

OOO^— — — — — OOW— WMOO N 

O OOOO — n MOO— —MO *Z«IOS »Mn — w— — — « 

— — o ooonw— mm wwwow— —on — wo— k n— no w 

ok woonooown — — — w OO.WO — w — NO — — 

ON — — WOOO O — O — — OO —NOO MwM ww ID 

O w NO — 0> — OO On O — — — w— J !» —OOO w w — w 

— — o— n— wo ooo— n— — ow o — o woo 

o— wowo — — » — — N— snatn< w j - »Nn— — — n— 

M — — O— »w«— ^n— wwwwwO — OO Z*fc»— wwwMwww — 

O — — — — 9 O w — o — n W — — — — O OO "3 M o n w w w 

oo (« n — wow on -— — oo owo.— w wnk woo 

o n — O — O O — — O — O O OOOO— OJ O O OW MN— — — — NO 

oNoono — — oo wwiiiiiiwn — n <— • wwwwwwww 

— o«o— oS- — o tttf oznn nn»w 

ma a oo j — w. — n 333 ao 9— ,P— * , »*'" < ^ ooo ooo •• 

w o — Ol»«— JwjyiooOOOwO oo a Owo — — — — ——— — 

—oooo nz— —on oooo— — oo ooo w 

— n w — o — zozoo «««<— oo a m w o • •••••••^ 

o — — — in o — w— oo n — — w — — o — Nnwino^o 

oj — oo— noo— o jjjj — oonw j;-«« i- 

z « j a — oo j — o — « j««««jooM— joJMn — wj Jtitttiti: 

<o— w — o a — — — w o n i* < o a o o — a a w o < z w o — < 3aoaaaaoa 
— oz — — oo *z* wonuoaooz o«* — — o o oso ooo oooo o. 

K J — O — — — M — O — OOOOJJJJ — — — OOOO— O — WOZUOOOOOOOO' 

h-o— N on — w — o — oa o j — — — — — o — N j— — o — J o ae — — — — — — — — : 



181 



DISCUSSION 

We have described a program that analyzes a three-di- 
mensional scene (presented in the form of a line draw- 
ing) and splits it into "objects" on the basis of pure 
form. If we consider a scene as a set of regions (sur- 
faces), then SEE partitions the set into appropriate sub- 
sets, each subset forming a three-dimensional body or 
object. 

The performance of SEE shows to us that it is possible 
to separate a scene into the objects forming it, without need- 
ing to know in detail these objects; SEE does not need 
to know the 'definitions' or descriptions of a pyramid, or 
a pentagonal prism, in order to isolate these objects in a 
scene containing them, even in the case where they are 
partially occluded. 

The basic idea behind SEE is to make global use of in- 
formation collected locally at each vertex : this informa- 
tion is noisy and SEE has ways to combine many dif- 
ferent lands of unreliable evidence to make fairly re- 
liable global judgments. 

The essentials are : 

(1) Representation as vertices (with coordinates), 
lines and regions 

(2) Types of vertices. 

(3) Concepts of links (strong and weak), nuclei and 
rules for forming them. 

The current version of SEE is restricted to scenes pre- 
sented in symbolic form. 

Since SEE requires two strong evidences to join two 
nuclei, it appears that its judgments will He in the 
'safe* side, that is, SEE will almost never join two re- 
gions that belong to different bodies. From the analysis 
of scenes shown above, its errors are almost always of 
the same type: regions that should be joined are left 
separated. We could say that SEE behaves "conserv- 
atively," especially in the presence of ambiguities. 

Divisions of the evidence into two types, strong and 
weak, results in a good compromise. The weak evidence 
is considered to favor linking the regions, but this evi- 
dence is used onfcr to reinforce evidence from more re* 
liable dues. Indeed, the weak links that give extra 
weight to nearly parallel lines are a concession to ob- 
ject-recognition, in the sense of letting the analysis sys- 
tem exploit the fact that rectangular objects are com- 
mon enough in the real world to warrant special atten- 
. tion. 

Most of the ideas in SEE will work on curves too. 
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CURVED 



OBJECTS 



How to extend SEE to work with objects possessing curved surfaces. 

Introduction and Summary Most of the heurl8tlc8 ttiat establish links 
at each vertex are unconcerned if the edges are curved or straight; a 
few heuristics get affected: those that use the concepts of collinea- 
rity and parallelism. 

Thus, it is necessary to redefine and broaden these concepts. 

1. A slight generalisation is obtained if each segment is represented 

as having two slopes (initial and final) . The functions PARALLEL and 

COLIKEAR of SEE are already modified for this (cf. listings). 

£ SEE does not care if the line Joining two vertices 

1 is a straight or curved line. The information 

about the segment Ai-B that is relevant to SEE is: 

(a) There is a line between vertex A and vertex B. 

(b) The coordinates of A and B . 

(c) The segment A-B separates region :1 from :2. 

2. Attempts to take limited account of the shape of the segment carry 

us to 

( a ) gently bent segments (definition) are those with bounded slope 
[Bounded curvature will lead to another definition] . 

A quasi-rectilinear object has faces, vertices and gently 
bent edges or segments} it is expected that SEE will work 
well for them. We should try some scenes. ISBQGESTION | 





a, b: gently bent segments, c: non-gently bent 
segment. A gently bent segment has a slope that 
at any point of the segment does not differ more 
than epsilon from the mean slope of the segment. 
All slopes fall in an interval around the mean 
slope. Gently bent segments form quasi-rectilinear 
objects. 



183 





Quasi-rectilinear objects. It is expected 
that SEE will work well for them. 



< b ) partition of a non-gently bent segment into several gent ly 

bent. Many of the bodies have vertices and curved edges, 
but the bodies are not quasi-rectilinear (a piece of chewed 
gum, leaves of a tree) . By breaking the edges into gently 
bent sub-segments, they become quasi-rectilinear bodies. 
The breaks will occur in points where the curvature is large. 
There has to be devised away to break a segment in a unique' 
manner. To avoid breaking a body into two by the introduc- 
tion of these artificial vertices, we propose to introduce 
also artificial links between regions, to account for the 
artificial vertex, 
w 

The non-gently bent segment ab 
gets broken into gently bent seg- 
ments ak, kl, lm, mb, by the 
artificial introduction of "new" 
vertices k, 1, m. 

Here, the introduction of 
additional vertices has to 
be accompanied by 'artifi- 
cial' or reinforcing links, 
to preserve the individua- 
lity of the body (of the 
owner of such vertices) . 





3. More complete consideration of the shape of the segments is obtai- 
ned as follows: 

(a) For parallelism, by requiring that two segments be parallel 
only if one is a translation of the other. Generally, this 
is a comparison that takes a time proportional to the length 
of the segment. Chain encoding {Freeman} {Conrad} is suggested. 
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(b) For colinearity, by discovering properties or features that 
"carry through" or are common. Among these ares 

1. Mathematical "regularity" of the segments. Both segments 
are described by the same or similar polynomials, etc. 

2. Heuristic properties: there must exist properties which 
will select with high probability the "right" continua- 
tion. 

3. Outside of the set of geometric properties, we have 
color, texture, etc. 




'The same line dissappears at b and appears 
**at c, making b and c "matching Is", but to 
discover this fact it is necessary to have a 
concept of "good continuation" or "good con- 
tour" . 



Alternatively, we may forget these properties here and include 
them into models of our curved objects, but then we are for- 
ced to make searchs in our scene like those made by DT or TD 
{my M.S. Thesis}. 





Fig. 'SUITCASES' 

Heuristic properties of segments (yet to be 
determined) could select a "correct" match 
for endings a, b, ..., k,l. 
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4. Bodies with no edges and vertices are in pri nciple easily Identi- 
fied by SEE. See fig. 'FRUIT 1 . 




Figure 'FRUIT' 



The bodies have no curved edges, and no vertices. The entire 
surface is smooth; no sharp edges or pointy corners. Examples: 
an inflated balloon, a frankfurt, a face, a cloud. 

It is doubtful that we could do something here with SEE. We 
could try to postulate "artificial 1 * vertices, using stereo perhaps, 
at the points where the 3 -dim curvature is large, and then postu- 
late lines between such vertices. This looks bad. 

Or we could reason as follows: since these objects do not 
have vertices or edges, then the only vertices appearing in &£ 
scene must sep arate two bodies . They will be mainly T -joints, 
(cf also page 46) 

In principle, separation into bodies looks promising, but 
recognition (the answer to "what is the name of this object?") 
seems difficult. Nevertheless, it is not clear r that with such a 
simple set of heuristics we could work successfully with objects 
as complicated as a human face, a blob of falling water, an 
amoeba, the surface of the sea (?). 



At some point, we have to know what we want A> ^ he C0Bplex £ ty 

increases, the concept of "body" depends less and less In geometrical 
properties (disposition of edges, vertices, ...) and more and more 
on purpose (Is a skeleton an object? Or perhaps the femur bone alone? 
The answer varies with our intention ~ with the context) . 

Thus, models are necessary again. 
See also 'Do not use over-specialized assumptions. . .', page 252. 
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REQUIREMENTS 

FOR THE 
PREPROCESSOR 



APPENDIX TO SECTION ON CURVE OBJECTS 
This appendix may be omitted In a first reading. 

Retirements for the preprocessor ^ preproceasor that feeds ^f 

to SEE has to find only: 

1. The lines of the scene. 

2. The vertices. 

3. The local slopes at each vertex. 

4. See also comments to figure R17. 

5. Illegal scenes (page 2.(7) should be detected by the preprocessor. 



How bad will curved objects be _ . . 

^ — — ^^— — i— - i— . in objects 

where the curves edges are gently bent, SEE 
will work fairly well. The more an edge 
departs from Its rectilinear equivalent, 
the worse SEE will work; T- joints will be 
difficult to find, a FORK may transform 
into a 'T', etc. (I am talking about the 
current SEE, described in the listings). 




Additional information could be used 



So far, we are trying to iden- 



tify objects on the basi) of form alone, 1. e., geometrical considera- 
tions. This is asking a machine to do more than a human being does. 
Ambiguous line drawings, such as ARCH, become lnambiguous when we 
introduce shading, lighting, texture, color, etc. All of these pro- 
perties could be used by SEE. In fact, consider how easy it would be 
to identify bodies if each one of them is of different color (and we 
could sense that fact). 

Psychological evidence Knowledge of the a i gort thms used by human 
beings for shape continuation (page 188>) is relevant. We quote from 
Krech and Crutchfield {1958}t 



187 



Grouping by Good Form. Other things 
being equal, stimuli that form a good figure 
■will have a tendency to be grouped. This 
is a very general formulation intended to 
embrace a number of more specific variants 
of the theme, traditionally classified as fol- 
lows. 

i. Good continuation. The tendency for 
elements to go with others in such a way as 
to permit the continuation of a line, or a 
curve, or a movement, in the direction that 
has already been established (see Fig. 37c). 

2. Symmetry. The favoring of that 
grouping which will lead to symmetrical 
or balanced whqles as against asymmetrical 
ones. 

3. Closure. The grouping of elements in 



such a way as to make for a more closed or 
more complete whole figure. 

4. Common fate. The favoring of the 
grouping of those elements that move or 
change in a common direction, as distin- 
guished from those having other directions 
of movement or change in the field. 

It seems plausible to consider that the 
percepts resulting from all of the above 
determinants would be such as to meet the 
criterion of a good figure, that is, one that 
tends to be more continuous, more sym- 
metrical, more closed, more unified. 

Now the reader will see that a difficulty 
with this general proposition regarding 
grouping centers on the crucial phrase 
"good figure." How can we know which 
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FIG. 37. Examples of grouping. In a, the dots 
are perceived in vertical columns, owing to 
their greater spatial proximity in the vertical 
than in the horizontal direction. In b, with 
proximity equal, the rows are perceived as 
horizontal, owing to grouping by similarity. In 
c, the principle of good continuation results in 



seeing the upper figure as made up of the two 
parts shown to the left below, even though 
logically it might just as well be composed of 
the two parts shown to the right below, or in- 
deed of any number of other combinations of 
two or more parts. (Adapted from Werthcimer, 
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BOX 21 

How to Measure "Goodness" 



Attneave has made an ingenious experi- 
: mental attack on the problem of measuring 
the "goodness" of a figure. The subject n 
given a sheet of graph paper composed of 
4,000 tiny squares (so rows by 80 columns). 
His task is to guess whether the color of 
each successive square is black, white, or 
gray. The experim e nt er has in mind what 
the completed figure will took like <fig. «). 




Without knowing what the completed 
figure will be, the subject starts by guessing 
the square in the lower left corner. When 
he has correctly identified the color, he 
moves on to guess the next square to the 
right. He continues this process to die end 
of the row and then starts on the left end 
of the next row-^bove. In this manner he 
successively gnoses cash of the «o°° 
squares. 

On die average, Attnceve's subjects made 
only if to 20 wrong guesses for the entire 
figure. How w*s this possible; The answer 
is that the figure was deliberately designed 
so that knowledge of petit of die figure 
was sufihaanr to enable the subject to make 
fairly vena 1 predictions about die remainder 
of the figure. This was accomplished by 
making all the white squares contiguous 
with one another, sad similarly the Mack 
and the gray squares. Moreover, the con- 



tours separating the white, black, and gray 
areas are simple and regular. Where the 
figure tapers, it tapers in a regular way. 
And it has s ym meu y ; after exploring one 
side, k is easy to predict the other side. 
Thus, the subject having discovered that the 
first few squares are white continues to guess 
whke, and be n correct until he hits the 
gray contour at the ieth Column. After one 
or two errors, he then co nt i nu es to guess 
gray. On the next row above, he tends to 
repeat the pattern of the first. 

AM these factors of compactness, symme- 
try, good continuation, etc n are aspects of 
what is implied by a "good figure.'' Thus an 
objective measure of the "goodness" of a 
figure is the ess* with which *e subject 
can predict its total form from minimal 
information about a part. 

Other figures, can be alalacty tested. For 
example, figure *> would prove to be a less 
"good* figure because the number of errors 
in guesting would be larger. 

AtrasaveV particular method will not, of 
coarse, apply to alt kinds of figures or all 
kinds of perceptJwl aefsttetons. But it 
does JeuKMis i iatt that taerr s«u ways in 
which "goodness" can be objectively deter- 
mined. 
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configuration of stimuli U "better" than 
another? 

To escape from this difficulty, we need 
to have independent criteria of what is m 
good figure. Some approach can be mad* 
to this; for instance, in the case of "sym- 
metry" there are objective rules we can 
apply to determine the relative symmetry 
of various figures. The same is true of sim- 
ple cases of "closure." (See Box it for a 
relevant experiment.) 



But we are far from being able to state 
such criteria when we deal with the highly 
complex configurations of our normal per- 
ceptual experience. Part of die difficulty 
stems from the fact of individual differ- 
ences among pcrceivers. One man's mess 
may be another man's order. And this may 
reflect the important role of learning and 
past experience in the genesis of "good 
figure." 
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ON OPTICAL ILLUSIONS 
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Given the nature of SEE, we will restrict the meaning of 'optical 
illusion' to illusions formed by solids, that is, ambiguities or 
inconsistencies when we (or the program SEE) try to find 3-dim bodies 
in a scene; thus, the Miiller-Lyer illusion ("A" in the topmost figure) 
is not considered. 

i u nHtif According to this, we msy elementarily 
classify the "scenes that are unlikely to occur* (that is, those 
that are not "standard" or "normal") in thr#e types: 

»» Possible but no "good" interpretation. 

»- Ambiguous — severs], good interpretations. 

=»= Impossible: without interpretation. 

Like POLYBRICK {Guzman}, SEE Is not ^specifically designed to 
handle optical illusions. It was primarily designed to analyze "real 
world" scenes; hence, an input scene that produces an illusion (in 
a human) is not likely to occur as input to SEE, Nevertheless, in 
the same way that we may overteat a program for square roots by asking 
for the square root of 'APPLE' ,Vd5. '» w 6 ■■» test SEE with BOme 
amblguouB scenes. Let us see what happens. 



POSSIBLE BUT NO "GOOD" INTERPRETATION „ . J ^ . _ , , . 

-■—-——-■——■»•"■»■>•■— ■""——-— —m—mmmi-^o—» Some Objects do not 'make sense' 

because they violate rules that most objects obey. Nevertheless, it 
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ACTUAL IMPOSSIBLE TRIANGLE was constructed by the author and his colleagues. 
The only requirement is that it be viewed with one eye (or photographed) from exactly 
the right position. The top photograph shows that two arms do not actually meet. When 
viewed in a certain way {bottom), they seem to come together and the illusion is complete. 

(From Gregory) . 
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One of the strong rules used by humans is that objects whose pic- 
tures show straight lines have indeed straight edges; another strong 
rule is to assume the corners to be like the corners of a cube (faces 
meeting at right angles) ^ . Under these rules, the above triangle 
does not make sense and people will classify it as an "impossible" 
object ( 'vARIAMT'will be an "impossible" object; Penrose's Triangle 
will be "3 sticks forming an impossible configuration or scene; 
"mounted in a funny way"; can not be seen as representing a single 
object lying in space). For instance, Gregory {Scientific American} 
tries to explain that the triangle has a real 3-dim object as origi- 
nator, by constructing a body consisting of three rectangular 
parallelepipeds ("bricks") joined at right angles, and then taking a 
picture from a special direction, so that the free ends a and b 
seem to touch: 



^=7 




V 



Pig. 'VARIANT' 



These rules (faces meet at right angles; straight lines mean 
straight edges) are deeply ingrained into people, but nature does not 
need to follow them always. The Penrose Triangle can be obtained by 
photographing a 3-dim triangle with curved edges and skewed corners, 
where each side touches the other two. 

SEE finds three objects in figure 'Penrose Triangle.' 
Other examples follow. 




Figure 'BLACK' 

People assume that faces meet at 
right angles, and this object 
violates that rule, making it 
"impossible" or odd-looking. 
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It is possible to construct object 'BLACK' with planar faces. See 
figure 'TEST OBJECTS' page Z09 . SEE finds one body in 'BLACK'. 

The object at right looks 
impossible if we assume all 
faces to be flat. If face aeb 
is curved, object is plausible 
R is its reflection on mirror 
M, and$, a smoother version 
of R. # looks "normal"; by 
deforming </Q we could obtain R. 

Unlike humans, SEE does not 
hold these "very common rules" 
as inviolable; SEE does not 
have any special problems with 
these "strange but true" 
objects. 

A misleading suggestion of 
superiority should not be concluded 
from these rare cases; in other 
situations SEE makes mistakes 
that a human being does not 
(see figure 'SPREAD'). 

Of course, SEE holds its own 

rules (for example, those of 

table 'Global Evidence') as inviolable; hence, given a "rare enough 

scene" it will make mistakes (cf. assertion in page 5"' , after the 

Theorem). This is a similarity of behavior, I think, between people 

and SEE — each one follows rather rigidly a small set of rules. 

(see also conclusion at end of section) , 
Besides, often humans will see the 'impossible' object as an 

object , doing SEE's job just as well. 
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Figure 
' STAIRCASE 




Figure 14. 

' Impossible 

Object.' This can be 

drawn, but it corresponds 

to no possible physical object. 

(From Penrose, L. S. and Penrose, 

R. (1958). Brit. J. Psychol., 49, 31.) 

(caption by Gregory) 



The "always descending staircase." {Gregory, in {Fogs}] 
The caption is wrong, this object could be constructed in real world, 
if some surfaces are curveiand/or the faces at the corners do not meet 
at right angles. Example of an object "possible but without 'good 1 
interpretation." See also Metatheorem on page 2>9 , Again, the "impo- 
ssibility" or oddness of 'STAIRCASE' comes from assuming the rules 
'straight lines in the drawing correspond to straight edges in 3-dim' 
and 'faces meet at right angles, like corners of a cube 1 inviolable, 

AMBIGUOUS - TWO GOOD INTERPRETATIONS „ aaa m „„„„„« ,.,,„,. „„, K » 
— ^— .^ — ^— — — — i^^ These are scenes that can be 

interpreted in several correct (non- paradoxical) manners, which are 

also "sensible" (as opposed to the Trivial Solution of page «d ). 

For instance, an scene like 




that can be interpreted as 
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(A) 



or as 




(B) 



SEE will generally give one of the possible answers, although 
not necessarily the one preferred by humans. In this example, SEE 
chose ( B ) . 

The following scene, locally ambiguous, is correctly parsed by 
our program. 





Sometimes, the conservatism of SEE and its partial 
insufficiency to make very global judgements will leave a body 
unconnected; for instance, the three faces of one cube below will 
be reported each one as a separate object, due to insufficient 
links. 



197 





IMPOSSIBLE: WITHOUT INTERPRETATION 



Images that can not be product 



of photographing (projecting) a 3-dim scene. These objects do not 
have physical existence. 



This scene is without 
interpretation, meaning 
no 3-dim scene (with 3-dim 
bodies) could have 
produced it. 




In figures like the above one, men are unaware of the extension 
of the background, and "i^\ makes sense even if B is back- 
ground. SEE is unable to make this mistake, and its analysis of 
the scene will reflect the fact: the preprocessor will complain that 
one region, the background, is neighbor of itself. See comments to 
scene R3, page 113. 

Of course, in these cases there is no answer to the question 
"which are the bodies in the scene?" Whatever answer SEE (or anybody 
else) gives, it is wrong. 

Nevertheless, according to our meta«theorem (page 33), there is 

an extremely easy way to discover and reject these imposible scenes: 

all o£ them are necessarily illegal scenes (q.v., page 217). And we know 

how to detect ill£gal scenes. SEE (or its preprocessor, rather) already does that. 

SEE detects all impossible scenes, by refusing the data as an 
illegal scene. 
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A PROGRAM TO DISCOVER HUMAN OPTICAL ILLUSIONS 

Some scenes get classified by our metatheorem as 'possible but 
not "good" interpretation', and likewise by SEE, which does not refuse 
to analyze any legal scene. 

Nevertheless, a person will stubbornly classify them as 'odd- 
looking' or 'not making sense' or 'impossible', even if we teach him 
the solution obtained by SEE (figures 'Penrose Triangle', 'Black', 
' Staircase ' , ' CONTRADICTORY ' ) . 





Figure 'CONTRADICTORY' 

One object is found by SEE: (:1 :2 :3 :4) . 
As such (since it is a legal scene), SEE 
classifies it as 'possible but not "good" 
interpretation' . A person will classify 
it as "hot making 3-dim sense": a human 
optical illusion. Is It possible to 
reconcile these views? 

Of course, the metatheorem (page ^9 ) insures that there is at 
least one solution, so SEE's interpretation is "right" (it has chosen 
one correct answer, generally not the trivial solution given by the 
metatheorem), and the mortal is wrong. Also, the theorem of page 90 
insures that any system (human or computer) that uses too "local" 
rules (see fig. 'MACHINE') will make at least one mistake, no matter 
what rules he (or it) uses. 
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H-optical Illu sions „,..„. .. ^ . _ „„„ 

There is thus a disagreement between SEE and our 

fellow subject, because SEE has classified the scene as 'possible but 
no * good' interpretation' and our man has said 'contradictory aB a three- 
dimensional scene'. Let us call these human optical illusions (such 
as 'Contradictory', 'Staircase', etc.) by the name h-optical illusions. 

What to do in these disagreements? Who is right? 

SEE is right Above comments seem to indicate that the electronic 
data-processor is correct. The human has used excesively "local" 
rules. That being the case, we can teach and train (if avoiding 
future errors is desirable) our subjects to "understand", raclonallze 
and make sense out of these h-optical illusions. Indeed, that Is what 
is tried in figures 'Black' , 'Penrose Triangle' , etc. Different 
people may show different degrees of (H-optical) illusion before 
training and after training (see Box). This training is possible 
(see Box). 

In other words, if SEE is right, the computer scientist has 
nothing to do, it is all up to the psychologists and educators. 

Man is right We may hold the view that the human answer Is still 
preferable. Then, to our relief, man is right and SEE is wrong. 
It is necessary (perhaps) to modify and correct SB2, so as to emulate 
personal behavior. We suggest a way to do this. 

A program to discover h-optical Illusions _ 

It is possible to enable 

SEE to detect these h-optical illusions, so that it will classify the legal 

scenes Into "possible" or "h-optical illusions." 1 1,,^™,™^,, I 

I SUGGESTION I 

As the problem of discriminating between background 
and objects (see section 'On background discrimination by Computer'), 
this is an interesting project from the "psychological" point of view 
but, as in the background case, it is not essential at the moment 
for our vision-robot work. 



* 

Strictly, there Is a third possibility: both are wrong. 
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BOX 



There is generally a wealth of available information— though none entirely 
reliable— for settling the size and distance of external objects, with sufficient 
precision for normal use. As is well known, the visual system makes use of 
a host of 'depth cues', such as gradual loss of detailed texture with increasing 
distance, haziness due to the atmosphere and nearer objects partly hiding 
those more distant. These cues were discussed in the nineteenth century 
by the great von Helmholtz (1925), who fully realised their importance, and 
they have been the subject of many investigations since, especially by 
J. J. Gibson (1950). Whatever the richness of depth cues, however, the visual 
input is always ambiguous. Though the brain makes the best bet on the 
evidence — it may always be wrong. 

The kind of mistakes which occur when the bet is on the favourite though 
the favourite is not placed, is shown most dramatically by the demonstrations 
of Adelbert Ames (1946). The most impressive demonstration is given 
simply with a room which is non-rectangular, but so shaped that it gives the 
same retinal image as a rectangular room to an eye placed in a certain 
position. Now clearly this room, though queer shaped, must appear the 
same as a normal rectangular room, for it gives the same image to the eye. 
But consider what happens when objects are placed inside the Ames room. 
The further wall recedes at one side, so that an object or person standing in 
one corner is actually at a different distance than is a second object placed 
at the other far corner. These objects (or people) appear, however, to be 
at the same distance— and they are seen the wrong size. This is clear evidence 
that we assume rooms to be rectangular (because they usually are) and we 
interpret the size of objects according to their distance as given by this 
assumption. When the assumption is wrong we see wrongly. What Ames 
did was to rig the odds, and then we make the wrong decision on size and 
distance. A child may appear larger than a man. We may know this is 
absurd and yet continue to see a bizarre world. The retinal image is all 
right, but the odds have produced the wrong internal file cards and then the 
human seeing machine is upset, and gives a wrong answer. 

It is interesting that the Ames room is seen correctty by peoples, such as 
the Zulus, brought up in a 'circular culture' of beehive huts where there are 
few reliable perspective features, such as rectangular corners and parallel 
lines, in their visual environment. To the Zulus, the odds are not rigged by 
the Ames room— to them this is not misleading perspective. They are not 
subject to this illusion, but accept the room as the shape it is, and see the 
objects in it correctly in distance and size. This is a matter of very real 
importance. It shows that when we are transferred to an alien or bizarre 
environment, where our filing cards are inappropriate, we interpret the 
images in the eyes according to principles found reliable in the previous, 
familiar world — but now they may systematically mislead and then percep- 
tion goes wrong. Space travellers beware! {Gregory, in {Collins 

and Michie}} 
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A possible way to attack the problem i* 

(1) To identify each link with whoever proposed it. 

(2) To set up systems of simultaneous "symbolic" equations. 

(3) To solve them by « limination. 
We elaborate: 

(1) Mark each link with the name of the heuristic that produces it. 
After obtaining the 'maximal* nuclei by GLOBAL and LOCAL, seve 
ral links are left (for example, three in fig. 'FINAL-BRIDGE') 
and ignored by the current SEE. Instead, one could see what 
kind of links they are, and one has in this way more informa- 
tion about the type of contradictions in the scene. 

(2) Introduce a 'conditional' link: regions :1 and :2 belong to 
the same body if region :3 does not. An OR link is now possi- 
ble by use of the conditional, since aa^b -S- b V "• a . 

(2.3) Introduce a 'NOT' link: :3 + :5, regions :3 and :5 do not 
belong to the same body. 

(2.6) As in ordinary algebraic equations, a system of n simulta- 
neous equations means that all of them must be satisfied; 
the "AND" of all must be true. Thus, AND is implicit in our 
notation. So far, we have OR, AND, NOT, IMPLIES (conditional): 
we have more than necessary. 

At the end, we have a system of simultaneous equations 
like these, where :1 = :2 means both belong to same body; this 
is an equivalence relation so I use the » sign: 

:1 - :2 OR :3 - :5 

:3 j« : 2 -^ :1 - :4 ( E > 



/ 
We now procede to "solve" these equations. Three things could happen: 

Exactly one solution is found. This is the normal case, and 

that solution tells what the bodies are. Familiar, "claar", possible 

scenes will fall in this case. 

== More than one solution is found consistent with our equations. 
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All are reported. This is the case "Ambiguous — several good 
interpretations . " 
== No solution is found. This is a genuine ^optical illusion, 

corresponding to a contradiction in the equations. For instance, in 
fig. 'CONTRADICTORY', equations set by the T- joints between :2 and 
:3 would be inconsistent with those set by the Arrows and Forks. 

How to solve the eq uations (E) by the so lution to (E) we mean a division of 
the scene (:1, :2 :n) by means of a partition of the form 

(:1 = :5 - :7 - :6), 

(:3 - :2), 

(:4) 
which is consistent with (E) . 

In the current SEE, 

(a) The equations are only equalities: :1 = :2. 

Also, equations of the type :1 + :2 are taken into 
account by inhibitory mechanisms, such as NOSABO. 
No conditional links exist. 

(b) Since all equations are of the type :2 - :3, the solu- 

tion is obtained by applying transitivity, that is, 

1 " 2 ~ parentheses 

9 = -i =? (1 = 2-3) indicate nuclei. 



O Q 



v- 



Except that we require two antecedents for application 

of transitivity (two strong links): 

1-2 

2 = 1 -> (1 - 2) 

=? 1 = 3 ^ (1-2-3) 
1-3 

2=3 
2 = 3 



§J^-© 
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An exhaustive search (which successively tests each possible parti- 
tion) of the solution to (E) is impractical except in very small 
scenes, and heuristic methods are needed. 

I suggest to start from the equalities such as 1=2 

2 = 3 
and to form nuclei with the current SEE, except that at each step 
we check to see if our current nuclei satisfy all of (E) ; for 
disjunctive equations such as " 4=5 OR 6^7 OR 4 = 6" 
we try each branch of the OR in turn, rejecting those who conduce to 
no solution (this may be pretty combinatorial, too). 

Perhaps it is possible to use more Logic here — some sort of 
theorem proving, 

Conclusions and conjectures _. . ,, . . 

1 The similarities between SEE and people 

(see also 'Human perception vs. computer perception, page 2 54) stem 

from the fact that, like SEE, people seem to use only a small number 

of rules (although not necessarily those used by SEE) , which work in 

almost all cases, but when these rules conduct to an ambiguity or 

inconsistency ("conflicts"), there is reticence to abandon them, and 

mistakes or impossibilities are produced. 

It is possible that, like SEE, people use primarily local clues, 
and with less frequency more global information to disambiguate 
interpretations. I think that, in the presence of objects (in 2-dim 
line drawings, such as 'MOMO', for instance) not seen before, humans 
follow general rules not unlike those used by SEE to distinguish 
or decompose a scene into bodies. Rules that apply to all polyhedra 
have to be invoked, since in presence of previously unseen objects, 
humans can not use a model of the object. 

The more familiar an object is (or if we have reason to suspect it 
or expect it) , the faster we abandon the general rules and propose its 
model as a possible explanatinn of part of an scene; we then jump to 
a model matching routine (a la OT {MAC TR 37}) that tries to fit the 
model to part of the scene (to a semi-isolated body); general rules 
a la SEE prevent us from overflowing with our model into other bodies , 
and help us to deal with partially occluded bodies. 
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ON NOISY INPUT 

The performance of our programs Is analyzed when the data has 
imperfections consisting of (1) misplaced vertices, (2) missing 
edges, (3) spurious extra lines, (4) missing faces, (5) two vertices 
merged . 

The section 'Analysis of Many Scenes' contains results of SEE 
when applied to imperfect scenes. 



Summa 



3L It is easy to predict the operation of SEE when the two- 



dimensional data supplied is clean , in the sense of being an accurate 
representation of the three-dimensional scene. 

In practice, of course, errors will occur in the data and it be- 
comes important to know how sensitive our program is to them. 

SEE has some serendipity. Many of the imperfections in the 
data do not cause mistakes in the linking procedure, or the link 
misplacements are not enough to cause erroneous identification. 
But mistakes are made. 

Here is how different types of imperfections are handled: 

■"The assignment of types to vertices is highly insensitive to errors 
in the position of each vertex, except T'S that become Forks of 
Arrows. Two cures to the exceptions were found, only the first 
of which is implemented: 

(1) Allow tolerances in concepts of parallelism and colinearity. 

(2) Allow a long but slightly twisted rectilinear segment to be 
"straightened", as indicated in comments on scene R17. 

== Missing edges are subdivided in three classes (discussed below); 
two of them produce recoverable or detectable errors (hence, 
susceptible of correction or prevention). It will be difficult to 
detect if a segment of the third class is missing; these will pro- 
duce recognition mistakes. 

ss Additional lines, like the ones caused by edges of shadows, are not 
easily detected as spurious or superfluous. Their presence mainly 
produces a diminution in the number of useful links, thus some- 
times causing too conservative behavior -- i.e., proposition of too 
many bodies. 

ss Whole faces may be missing. Ordinarily (see scenes L.2, L.9T). 

206 



: v^^^^lf^^,^^^, _ 



the remaining part of the body gets correctly identified. 

OBTAINING THE DATA 

The scenes analyzed by our program in this thesis were obtained 
by one of two methods t 

By free drawing A Une drawing representing three-dimensional objects 
was made; the coordinates of each vertex were accurately measured (or 
computed) and the information was put in the 'Input Format' form 
previously described- Also the regions belonging to the background 
were indicated as such. 

These scenes have mnemonic names such as TRIAL, BRIDGE, etc. 

What kind of projection did vou use ? Were theae ^om^ric drawings? 
Since no assumption is made on the rectilinear objects being drawn, 
the drawings are not isometric, or perspective, or ... projections. 
They could be any of them. It is not assumed that "we are dealing 
with prisms, with faces of a body meeting at right angles (like the 
corners of a cube) ,"°with convex objects. Neither the drawings nor 
the program make any assumption of this type. If the reader wishes 
to adopt the assumption specified above in quotation marks, then the 
drawings will correspond to orthogonal projections of three -dimensional 
scenes. 

Ho support hypothesis is needed: if necessary, the objects could 
be floating in a transparent fluid having their same density. 

By construction Arbltrary but not too complicated objects were cut 
from pine wood, with flat surfaces, and painted black. Their edges 
were painted white. By placing them on a black table (see first few 
pictures of this thesis) in different positions and combinations, 
three-dimensional scenes were created (see figure 'TEST OBJECTS'). 
Pictures were taken with high contrast film slightly under-exposed 
so as to render black everything but the lines. Diffuse illumination 
eliminated shadows [.Great help was received in the pictorial task 



207 




208 




209 



from Messrs. William H. Henneman, Devendra D. Mehta and David Waltz, 

and is here acknowledged]. The photographs were taken with a depression 

o o 
angle from 45 to 90 (that is, looking down), 50 mm focal length 

lens, 35 mm camera (standard equipment). 

The size of the prints is approx. 8r by 11 inches (21.5 by 28 cm). 
If some lines were not clear, they were retouched with white ink. 
If some lines were missing , they were MOT added . 

The pictures have names like L2 or R3, a letter and a digit. 
Most of them are atereographic pairs, taken with both cameras having 
parallel optical axes, and the sensitive film on the same plane. 
SEE only analyzes one scene at the time, so the left picture is not 
consulted when SEE analyzes the right picture, and viceversa. 

A transparent millimetric mesh is laid on top of the prints, 
and the coordinates are read by eye and put by hauni. in the 'Input 
Format' form. The thickness of each line is about 1 mm (see figure 
'TEST OBJECTS'); typically, the size of a scene is 10 or 15 cm: a 
minimum error of £ 1 per cent in the coordinates of a vertex is al- 
ready present. The slopes and directions of short segments suffer, 
naturally, much greater errors. Also, if two vertices are too close 
together (about two millimeters) they are merged and codified as one. 
We are simulating the kind of mistakes that are likely to occur. 

Also, some bias is introduced, no doubt), by the human operators. 
[By reading the coordinates in most of the scenes, immense help waa 
given by Miss Cornelia A. Sullivan and Mr. Devendra D. Mehta; the 
author acknowledges it.] 

Irrespective of the generation method, the scenes that appear in 
this thesis were drawn in their final form by the PDP-6 computer 
through a Calcomp plotter, and then inked and finished by hand . 

Thus, it is possible to perceive in many of them the imperfections 

of the data that SEE had to analyze* 
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MISPLACED VERTICES 

The coordinates of a vertex may contain a small error or 'noise 1 , 
How does this affect the type of a vertex? Does the type change? 



L. 



FORK. 



ARROW 



z 






Y 



^ 



Not affected 



Not affected 



Not affected 



K. 



X. 



PEAK 



MULTI. 



K 
—/ 






Transforms into MULTI. 



Transforms into MULTI. 



Transforms into ARROW 



Transforms into FORK. 



- 5 



Not affected. 



Not affected. 



Many types are unaffected. Type K vertices transform into 
MULTI> but since K's are seldom used by SEE, this is no big loss. 

X's transform into MULTIs, and we lose two links here, which 
makes SEE to behave more conservatively. Also GOODT gets affected 
(though not much). 

The serious change are the T's that get transformed into ARROWS 
or FORKs, when these T's are matching T's. Because they are used 
for linking otherwise disconnected pieces of a body, their loss 
generally implies the partition of a body into two. See figure 
'DISCONNECTED*. 
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(A) 





Figure 'DISCONNECTED' 



The T's under discussion are marked by 
small circles ( • ). In (a), the mis- 
classification of these T's into Arrows 
or Forks does not break the occluded 
body, who retains its unity thanks to 
il. In (b) , the same mis-classification 
does break the occluded body, reporting 
two objects instead of one, a possible 
but less desirable answer. If the T's 
are not matching T's, as in (c) , their 
mis-classification does not matter. 



The loss of matching T's makes the program to be more conserva- 
tive in some cases. 



In some 
sense (see 'Desirability 
Criterion') this is tolera 
ble. 

What other perils does 
the misclassif ication of 
the T's bring? We should 
worry if, due to errors cau- 
sed by T's, the occluded 
body joins the occluding 
one. 



DESIRABILITY CRITERION. 

(1) We would like a SEE that never makes 
mistakes. SiMcethis is not possible, 
then 

(2) We would like it to make mistakes of 
only one kind, either join; two 
bodies that should be left separated 
(intrepid, cavalier behavior), or 
leave unattached two nuclei that 
should be reported as a single ob- 
ject (conservative behavior). 

(3) Among the two, we prefer a conserva- 
tive SEE, because its errors will 
be easier to correct (cf. Stereo 
Perception) . 




The T's should not originate 
the reporting of :l-2-3 as 
part of one body 



Each T, when perturbed, will go to one of these states: (N) normal, 
nnncr-t-,,,-^^. fT\ "l^f" 1? ^„,„_j_ T7 e> _L becoming 

mi ' 



t 



a FORK, or (R) "right", when E„ moves away 
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from Ei j /g^ becoming an Arrow. 

For three T's of an occluded body, 3 = 27 states are possible. 
They are shown in next page, in table 'THREE Ts'. 

How many of these 27 states will produce 

mis- links joining 1 with 3 or 2 with 3 

or 1 with 4 or 2 with 4 (none of the four 

regions is necessarily background) ? 

None. 

The reason is that (see description of NOSABO) a T or an Arrow 
or an L inhibit the link shown below, 




I 



so that (a) An arrow in position (I) [or (III)] suggests linking 1 
with 4. This link is inhibited by the L at IV [or VI]. 
Example: Figure R L L in Table 'THREE Ts'. (f*5 e *"^ • 

(b) A Fork in position (I) [or (III)] suggests 

(i) linking 1 with 3. Inhibited because of the T or 

arrow in vertex II. 
(ii) linking 1 with 4. Inhibited because of the L in IV. 
(iii) linking 4 with 3. Depends on outside considerations. 

Discussed below. 
Example : L R L. 

(c) An Arrow in position (II) suggests linking 1 with 2. 
Inhibited or allowed according to vertex V. Example: RRL. 

(d) A Fork in position (II) suggests 

(i) linking 1 with 3. Link inhibited by the T or arrow 

of I. 
(ii) linking 2 with 3. Inhibited by the T or arrow in III. 
(iii) linking 1 with 2. Inhibited or allowed according to 

vertex V. 
Example : R L N. 
Thus, no link is possible, even under these "noisy" circumstances, 
between 1 and 3 or 2 and 3 or 1 and 4 or 2 with 4. That is, 
the 27 cases of table 'THREE Ts' are treated correctly. 
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A possibility of bad linking exists between 4 and 3 in this 
case, if two T's convert into forks and "help each other": 




Two links originate 
the joining of 4 
and 3. 



Rather than get involved in this sub-problem, we will point 
out two solutions to the misplaced vertices: (1) by allowing some 
tolerance in 'parallel' and 'collinear' ; (2) by 'straightening out' 
crooked or twisted segments. We explain. 

Equal within epsilon ( de f lnit ion) a is equal within epsilon to b, 
written a=b, iff |a-b| <l€|. Generally, € > 0. 

Tolerances in collinearity and parallelism Two Unea flre parallel if 
the sine of the angle formed by them is smaller than SINTO. (si* — O) 
Currently, SINTO - 0.15 ^ ----^ »»c 

Lines ab and be are colinear if 
length ab + length be ~~. length ac. Currently, COLTO =0.05 

We have implemented these definitions. Better definitions exist. 

These definitions allow most small inaccuracies in the coordinates 

of vertices to pass unnoticed. Although they are giving reasonable 

service, they are only temporary, since by relaxing too much the 

criterion for parallelism and collinearity, strange things could 

happen (fig. 'CROSSED'). 

M 



*. 




If- 
Fig. 'CROSSED' 

A too lenient definition of parallel 
and collinear could give the follo- 
wing matching T's: a to d, b to f, 
c to e. 
See also on section 'Analysis of many scenes' comments to L9 andR9T. 
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Straightening twisted segments „, , _. . 

^— — •^—^— i^— — — ^^— ^^_ The definitive cure is simple: 

reassign the slope of be tc be that of ad, if be is small, ad large 

a. 

/c 

• J. 

and the angles at b and c are close to 180 . See also comments to 
figure R17. This has not been implemented. In this way, all cases of 
table 'THREE Ts 1 will be solved. See also comments to scene R4. 

Probably the preprocessor will automatically take care of this 
rectification, since it may prefer to give a long segment ad instead 
of three almost collinear shorter segments ab, be, cd. 

Since the straightening of a segment replaces some known vertices 
(which we suppose inaccurate) by other idealized vertices, we may be 
introducing uncertainty, in the form of non verified hypotheses, to our 
data. The object in the scene could really be "crooked" or twisted. 





Fig. 'TWISTED' 

The object to the left is really bent as shown. 
If we idealize it as in the right, we are falsi_ 
fying the information about it. 

By replacing it by an idealized version, we may be creating 
problems for its identification, when we want to assign a name to it. 
But notice that the 'unbent' version or idealization is handier for 
SEE. 

If the information is very bad _ . , . 

Throw it away and read the scene 

again. A simile indicates that the issue becomes one of allocation 

of resources: if you receive a written message containing a few 

wrong characters and missing words, you may use your brains and time 
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to deduce the omitted portions (by employing the redundancy, for in- 
stance). If the dispatch is very garbled, you might as well request 
a new one. 

Summary ^ t ig known how to handle small inaccuracies in the position 
of the vertices. 

MISSING EDGES 

From time to time, an edge will fail to show up in the scene, 
and the questions are ( 1) how much harm will be produced, and (2) 
how can we detect and correct the anomaly. An example appears in 
page 141. 

Illegal Scenes Llneg that end abruptly produ ce illegal inputs, 
suggesting that segments are missing. 




^ ^ 



Jig. 'ILLEGAL' 



CM 



In (a), a vertex has one edge. 

In (b) , the network can be separated by erasing 

just one edge. 

Both are illegal scenes, indicating missing or 

extra lines. 

Also (Figure 'ILLEGAL', (b)) a region can not be a neighbor of 
itself — another irregularity that points to deficient data. Cf. 
comments to scene R3. (f"& l '3>< 

These constraints can be nicely exploited by a preprocessor. 

Line proposer and line verifier A Une proposer lfl a pro&ram that 

suggests places where a line can be missing; a line verifier is es- 
sentially a precise line finder that searches a line in only a small 
portion of the scene, as told by the line proposer. 
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In the body of this section we will develop several heuristics for 
use in a line proposer. The verifier is not discussed. 

Blum's line proposer , , , . . 

. An algorithm has been designed by Manuel Blum 

{1968}, that will detect many places where lines are possibly missing. 

It suspects concave regions. An angle bigger than 180 originates a 

search for the omittedline in directions parallel to the neighbor 

A' ,* 




Figure 'BLUM' 

Region 12 is suspected to contain undetected lines, 
because it is concave. Vertex v is chosen becau- 
se its internal angle is bigger than 180 degrees. 
From it, Blum's proposer will suggest to the line 
verifier to look for lines in directions VA' and 
VB' (broken lines), parallel to the neighbor edges 
A and B. It also searches (dotted lines) along 
the continuation to lines C and D. 

edges (fig. 'BLUM'). It also originates searches along its own 
edges. In other conditions, a vertical line is searched. 

No harm is done by a bad proposer. Only some time is wasted. 

Internal edges , . . 

■i If a missing line J .s totally internal to a body, and 

is not detected by the line proposer, its absence will at most cause 

conservative Vehavior in SEE. In some cases their absence does not 

confuse SEE (figure 'MISSING'). 

The majority of internal edges cause concave regions to appear 

(fig. 'BLUM'). They will be detected by a line proposer. 
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Fig. 'M I S S I H G' 



Cases where the disappearance of an Internal 
line (dotted) does not separate the body. 

In (a), the object separates into two. 
This case is recognized by Blum's heuristics. 
Else, SEE could check for this configuration 
as a special case. 



External edges Edges that separat e two bodies are called external. 
If undetected, their disappearance will cause 'intrepid' errors by 
SEE, which are undesirable (see 'Desirability criterion 1 in page 212). 
Two cases result: (1) Only part of the edge disappears; there is possi- 
bility of correction. (2) The whole edge is both external and missing 
(and the scene is still 'legal'): a mistake will occur, See figure 
'External Edges'. 

It can 



Case (1) Only part of an external edge disappears, 
detected because 

(a) a concave region is generated, and 

(b) the region has internal angles big 
ger than 180 where a line "goes 
through 11 ! ab is colinear with cd. 



be 



w 
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Figure 'EXTERNAL EDGES' 



A segment separating two bodies may disappear. 
(1) If that segment is part of a larger segment, 

^ i ?r POSS iH e to sense and correct the anomaly. 
U; x± a whole external edge is missing, its 
absence remains undetected, inducing a mistake 
in SEE. In (i) an external edge disappears, and 
creates an illegal figure. 

Case (2) The complete edge is missing. Then (b) of case 1 fails, 
and detection is difficult. 



SPURIOUS EXTRA LINES 



They are lines that "should not be there", such as those 
caused by edges of shadows. 




Fig. 'LIGHT AND SHADOW 
Each body becomes two; each one is recognized 
independently by SEE. Four bodies are found 
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Shadows of rectilinear objects travel in planes that (in theory) 
part an object in two (or more) : the illuminated part, and the dark 
one. Each is a separate object by itself, according to our definition 
(see 'Several definitions of a body' ), since they have plane boundaries. 
SEE should recognize them. 

In practice, we have not tried our program with scenes having 
lines produced by shadows. A conservative behavior, like in figure 
'LIGHT AND SHADOW', is expected. 

Some shadows gradually diffuse; multiple lights cause multiple 
shadows. These problems may have to be solved by assuming or compu- 
ting the direction or position of the light sources. 



MERGED VERTICES 

Two vertices fused in one will produce diminution in the num- 
ber of useful links they report, since the resulting vertex will 
be of type MULTI. Thus, conservative behavior is expected from SEE 
in these cases (see Fig. L19, L17T, RI7, L4, etc. The program does 
well in them, when not too many coincidences are present). 

I 



SUGGESTION 



It is possible to analyze the vertices of type 
MULTI and try to decompose them in simpler types (compare figure 
BI9 with WRIST*). Read comments to R19 and L19. 

CONCLUSION 

On scenes obtained from "real world" data, inaccuracies are 
expected, and it is required of SEE to work well despite them. 
Currently, the behavior of the program in these cases is not 
discouraging, but is not extremely satisfactory, either. The 
additional work needed depends heavily on obtaining genuine 
test data, instead of the faked data used in the experiments 
described. 
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BACKGROUND DISCRIMINATION BY COMPUTER 

A program determines the regions that belong to the background 
of a given scene; that is, the regions that are i»t members of any 
of the bodies. Examples are given. 

Need 

' The program SEE requires to know which regions of the scene 

belong to the background (cf. 'SEE, a program that finds bodies in 

a scene 1 ). At present, this information is supplied by the user, 

as described in section 'Internal format' (page (,<. ) and 'Input 

Format' (page (.i ) of a scene. 

In the current vision experiments, it is not difficult to 
determine the regions that form the background, since they are always 
black and homogeneous (see first few pictures in this thesis). But 
in more realistic scenes, there will be a great demand for a background 
finding program. 



Therefore, it is interesting to try to 
develop a program to separate the "ground" 
in the back from the objects in the 
"foreground", having a limited information 
consisting of the scene as described in 
section 'Internal Format', namely, vertices 
and edges. 

That is, we will use in this task only 
"geometric" properties. 



Such program has been written, and works automatically under 
the command of PREPARA, the function that converts a scene from its 
'Input Format' to its 'Internal Format'. When the regions forming 
the background are not supplied, PREPARA activates our program, 
named BACKGROUND, and these regions are searched for; otherwise, 
SEE is supplied with the background regions as declared in 'Input 
Format ' . 
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Example. Scene 'HARD 1 . The results obtained are 



(SUSPICIOUS aR£ NIL) 

THE BACKGSOUrtD Or HAftD 15 
(*34 <36 s35) 
(:34 J 36 »35) 




Three regions are found to be part of the backgrounds :34, i36, 
and :35. That Is correct. 

We now proceed to describe the subroutines that make such 
Identification possible. 

Suspicious In a f lrst pa98 > we collect the regions that "may be" 
background, and call them "suspicious regions". Regions that are 
not suspicious are LIMPIO (clean). 

Ideally, if a region :R contains L's, FORKs, ARROWS or T's in 
the position below, it is not a part of the background. 



:* 



(I) 





:R 




(II) (itxO 

FIGURE 'BACKGROUND' 



(IV) 



In an idealised situation, sR can not be part of the 
background: it is clean , or free of suspiciousness. 
tR will be called 'LIMPIO' (clean). 
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(I) means that the background [almost] never is the internal 

part of an 'L' (the region containing the angle smaller than 
180 degrees) . 

(II) means that the background does not contain FORKs. 

(III) means that the background is not in the "inside" of an ARROW 
(the background is not a 'proper'arrow'). 

(IV) means that the background can not be the flat region of a 'T'; 
this in turn means that a body can not disappear under the back 
ground and then reappear at some other point t 




:3 is not the background. 

We reinterprete rules (I)-(IV) as follows: 

(I) A region "inside" an L is LIMPIO (clean) . 

(II) A region containing a fork is LIMPIO. 

(III) A region "inside" an arrow is LIMPIO. 

(IT) A region "on the flat side" of a T is LIMPIO. 

Clean Vertex (definition). A vertex is clean with respect to a re- 
gion if it Indicates, through rules I-IV, that such region is LIMPIO. 
For instance, K is clean for :1 and for :2, 
since (III) indicates that :1 and :2 are LIM- :f 
PIO. K is not clean for t3. 

These heuristics are not 100 per cent infallible j also, in a 
moderately complicated scene, coincidences of vertices are bound to 
occur, originating violations to I-IV. For instance, in figure CORN 
(page 150), vertex UU is a Fork belonging to the background, in con- 
tradiction with (II). 
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For completeness, we present a violation to each one of rules I-IVi 

:l 



(l) 




(in) 

FIGURE 'VIOLATIONS' 

:1 is the background. In all four cases, 
vertex V violates rule specified at the 
bottom of figure. They are rare cases. 
The situation indicates that rules I-IV 
provide noisy information, which has to 
be dealt with carefully. That is what is done. 

The vertices of each region are analyzed under rules (l)-(IV). 
To allow for coincidences of vertices and rare cases (like those in 
figure 'VIOLATIONS'), it is permitted for a suspicious region to 
have a small number of clean vertices. 

The number of clean vertices is compared with a quantity that 
is a small fraction of L (the number of vertices on the boundary) ; 
currently, that fraction is L/9. 

== If the number of clean vertices, that is, vertices satisfying 
I-IV is bigger than L/9, we call that region LIMPIO ("clean"). 
In addition, (a) If L is large (bigger than 25, currently), 
that region is BIGFACE, such as :21 of 
scene L19 (page 144); 
(b) Otherwise, it is only LIMPIO (normal case). 

If it is not bigger than L/9, then it is SUSPICIOUS. Also, 

(a) If L is large (bigger than 25) , the region 
is BACKGROUND, 
(b) Otherwise is only SUSPICIOUS (normal case) . 
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That is, a region LDfPIO has to have at least 
1 + [one vertex of each nine] 

"clean" vertices. 

Example. Region :3 has four 'clean' 

vertices (four vertices indicate that t3 

is L3MPI0) — It can not be SUSPICIOUS. 





Figure ' EQUILIBRIUM ' 

(This scene is correctly analysed by SEE) 
All the three vertices of tl are not clean; 

tl will become Suspicious (a candidate for 
background). Five of the seven vertices of 

:2 are clean, so :2 is LIMPIO. Note that 
vertex C is clean for :2 and not clean 
for :1. 

For example, when we apply the function SUSPICIOUS (see listings) 
to every region of scene SPREAD, the suspicious regions turn out to bet 
Suspicious only: i35 :18 i34 :2 :3 :12 til i33 t37 

t47 :48 t46. 
Background: :48. 

Summary By analyglg o£ ltB vert ices, each region is either LIMPIO or 
SUSPICIOUS. The suspicious regions with more than 25 vertices are 
classified right away as BACKGROUND: a suspicious region with many 
edges is probably background. 

The selection is done entirely using "local" properties: a 
region is classified according to information supplied exclusively 
by its own vertices. 
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FIODKE 'SPREAD' 

Each region Iselasalfled >ee MfcXIO, 
SUSPICIOUS or BACKGSOOKD. 



Our goal *» to decide t&leh of the suspi- 



More global Indications 

clous regions are IJBfflO, and which ones are 1 

— Since two background regions can not' be contiguous ( the back- 
ground can not be neighbor of "itself 1 ) 1 ,' "W&fi&tMt regions that 
are contiguous with the 'backgrouBd^''a^''cieaitoI : sBd' ! 'i^*t in the 
LIMPIO status. 

In our example , :48 is background and therefore its sus- 
picious neighbor :18 gets cleaated aoa becomes tll# 10. 

-■ Links are established through the matching T's. We call them 
b-llnks. r 

Ideally, a suspicious regions-linked tp a LIMPIO region 
gets cleaned , a suspicious regionalinked to the background gets 
converted to background too. 
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Idealizing, suspicious region :1 
becomes LIMPIO, and suspicious 
region :2 becomes background. 
A more complicated procedure is 
actually used. 



In practice, we allow for small errors as follows: 

For each suspicious region, we notice if it is b* linked 

to background (BA), suspicious (SO), or Limpio (LI). 

BA == == If it is U inked to background regions, we 

change it to Background, except if it has a 
background as neighbor, in which case we do 
nothing and continue. 

() SO LI If notHinked to background, but H inked both 
to Suspicious and Limpio regions, 

(1) If LI < SO, continue, do nothing. 

(2) If LI ^ SO, classify this region as 

limpio (LI is the number 
of LIMPIO regions b> linked 
to the current region un- 
der consideration) . 

() SO () If blinked only to suspicious, continue, do 
nothing. 

() () LI If blinked only to Limpio, change it to Limpio. 

Note: Sometimes I write Limpio, sometimes LIMPIO, 
they mean the same. 

If not 11 Inked, continue, do nothing. 
We keep applying these rules until no change is observed. In 
this way, we have eliminated several suspicious regions. 

In SPREAD, the suspicious regions were 35, 18, 34, 2, 3, 
12, 11, 33, 37, 47, 48, 46. :48 is known to be the background 
(that was done in page ix(,) , soit is no longer suspicious. >18 
Is a neighbor of the background (:48), and got cleaned in the 
page before this one. 

:11 isUinked with the LIMPIO : 9 and with the suspicious :3. 
Therefore, :11 changes to LIMPIO. 

:3 is I>1 inked with the Limpio til, so the suspicious :3 be- 
comes Limpio. 

:12 is blinked to the Limpio :10, and gets cleaned. 
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i46 is b*linked to the background t48, and gets made 
background, since :46 is not, at this moment, a neighbor of 
background . 

:34 isHinked to the background :48, and gets made 
background, since : 34 is not a neighbor of background. 

:37 isHinked to the ilMPIO region t4, and transforms 
into LIMPIO. 

t35 isHinked to the region :34, which is background, 
so that the suspicious region t35 becomes background instead ."'iii&B 

12 is a suspicious region blinked to the region :35, which 
is part of the background. According to our rules, :2 becomes 
part of the background. -z*s «&> iA'KkeJ & ■*** i>*ciearou*J. :n$. 

At the end, only regions :33 and :47 remain suspicious: 
(SUSPICIOUS ARE (:33 j47)) 

We collect all these 'stubborn' suspicious regions and label 
them background, except those which are neighbors of background. 
A better procedure may be to make the exception in 
those regions that are neighbors of suspicious re- 



SUGGESTION 



gions. That is, two neighboring suspicious regions prevent 
each other from becoming background. I have not explored 
this possibility. 

In the example SPREAD, :33 and :47 are made background. 

= If no region is background at this point, make ane of the "big- 
faces" background. There is room here for improvement. 

™ If no background yet, make background the region with most 
vertices. This is not yet implemented. 

In our example, the (final) background regions aret 
;33 j47 i35 :34 i2 :48 :46. <— BACKGROUND OF 'SPREAD'. 
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Other examples of background finding. 
Scene CORN 

L. L. L U 
K u u r 
SlL^'^^ika I u "' 

b r_ r. K C - 1 ,\ 5 F0"f tj A C l\ h i* U ! •' L 5 ir '. ' J " !\ 
(dJS-MCIGL'S An-i ^ i i_ ) 

1 H E c' n C ft Ci ->" ..) Ji.L, l' " L C * I ■■. i ? 
i i22 ) 




Scene BRIDGE 



( i o Q IS t? I L- r A i ; hi | 

( S u o r 3 1 C I C U S * R ii MIL 

I l-£ r3 K C ' < '-•> "< JUH l.i 

( S3Q ) 
( s 3 > 
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Scene MOMO 



- One mistake (:31) is produced here. 



L L. t "' A 

FuUK 

Slut j t ! '» l « A i u ^ 

T f H c j c 'N t ri A i J K 

'-i « f i. S 

"« :< I £ 

id A n!C-i ) Mi f'Jf! "> * C I* li « L 1 J \ U S U r Hi'T'C 
(^SHIliOIjS Ah- ( J 3 i ) ) 

( i 6 S3) s " ) 




FIGURE —'MOMO.' 
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The problem Is ambiguous L±ke in the cage of body l8olatlon ( Bec tion 
'The Concept of a Body'), the problem of determining the regions that 
belong to the background of a scene (regions that belong to no body) 
is ambiguous; many solutions are possible, as long as no two back- 
ground regions are contiguous. 

Among the multitude of solutions there exists a preferred one, 

which is "the" standard (common, familiar) interpretation chosen 
by people. 

Our program tries to choose also, among the many solutions, 

the standard one. 

Summary 

^^^^— A lenient algorithm finds regions (by analyzing the types of 

their vertices, and their neighborhood relations) that may possibly 
be background, and labels them "SUSPICIOUS". With the idea of 
re-classifying the suspicious regions as 'LIMPIO' (clean, no back- 
ground) or 'BACKGROUND', a system of b- links is introduced. These 
b- links provide more global information about the scene. 

Members of the suspicious set are assigned to one of the other 
two sets (fc«f;e«Vifl»«»^ while the algorithm tries to minimize the b- links 
between Background and Limpio regions . 

— ■ — — Fair results are obtained with the algorithm just 
described. Sometimes, regions are obtained as Background that 
are genuine components of a body ("Limpio") and vice versa. 

Refinements are needed, but since in our present vision experi- 
ments the background is a homogeneous black area (see first few pic- 
tures of this thesis) , no emphasis is shown right now. 
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STEREO PERCEPTION 

Summary go fgr wg have dlscusse( j t h e identification of objects in a 
scene and ignored the problem of locating them in a three-dimensional 
space . 

There are several ways to achieve this. We will discuss here one 
of them: the use of more than one view of the same scene. 

A natural first step is to establish the correspondence between 
points in the two views; that is, given a point in one scene (left), 
to find the corresponding point in the other scene (right). Theorems 
S-l below and S-2 on page 
234 express criteria 
for this "stereo matching". 



THEOREM S-l 

If both cameras are identical, their optical 
axes parallel and the films or sensiti- 
ve surfaces or retinas lie in the same 
plane , 

then a simple necessary condition for two 
image points, one in each retina, to 
have come from the same 3-dim point , 
is that both image points (left and 
right) have the same y-coor 
dinate , 
measured in the direction perpendicu- 
lar to the line joining the optical 
centers. 



SEE can independen- 
tly decompose the left 
and right scene into the 
bodies forming them, leav- 
ing as a problem to de- 
termine which of the ob- 
jects in the right scene 
corresponds to an object 
in the left scene. This can be done because each object will appear 
in both views with the same maximum height and minimum height (highest 
and lowest values of the y-coordinate of points belonging to that 
object) ; comparisons are easily made by replacing the objects by 
"intervals" consisting of these two numbers. 

Further disambiguation can be achieved by the use of the function 

(WHERE X. Y_ X_ Y D ), which determines the (x, y, z) 3-dim position 
L L R . R 

of a point of which its two 2-dim locations (X^ Y ) and (X^ Y R ) 
are known. {Griffith, AI Memo 143}. 
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Figure 'POUTS' 

Given two images of the same scene, before 
we can proceed to situate it in 3-dim space, 
it is necessary to know which points of the 
left scene correspond to points of the right 
scene: we have to discover the genuine pairs 
in it, a small subset of the cartesian pro- 
duct (a, b, c, d)X (e, f, g, h). It is 
desirable to have an algorithm that avoids an 
exhaustive search on this product. 



Genuine Pair (definition). A pair of points (P_ , P ) produced by a 
real 3-dim point of the scene in consideration. 

Theorem S-2 below gives conditions that a genuine pair must meet. 
A particularization will produce theorem S-l above. 



THEOREM S-2 



The left image P and the right image P of a point P 
Ii R 

have associated with them a variable, computable from 

(X^, Y_) or from (JL, Y ) , that will acquire the same 

value on P and on P . It is invariant under change 
I* R 

of scene. 

For the case where the optical axes are parallel, 



this variable is simply the y-coordinate (Y_ 



Y R )or 



height of the image. 

For the case where the optical axes meet, this 
variable Is Yi an angle that plane P. -C -P-C -P_ makes 
with f , the plane containing the optical axes. 

Any monotonic function of y will be just as good, 
(cf. figure 'GENUINE PAIRS'). 



From the theorem, the algorithm (referred to in fig. 'POINTS') that 
we may use to establish correspondence between points in the two 
views is: 
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Compare only points with the same y 
(or the same y-coordinate) . 

Points with different y can not 
come from a genuine pair. 

For each body, the knowledge of the 3-dim location of a few of its 
vertices will be sufficient to position that body in real space, 
achieving in this way the goal of this section. 

See Digression 1 in section 'The concept of a body' , for a 
different approach. 
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Figure 'y - P A EA M E I R I Z A I I OH 1 

From geometrical considerations and the coordinates of a 
point P^ in L, it is possible to attach to the line A-P L 
an angle y. SbniUrty, an angle is obtained for lines of R. 
It can now be said that a genuine pair (P^, P R ) must 
have the" same y's for P L and P R . 

Y is a physical quantity, namely the angle that 
the plane passing by the image P L and the optical 
centers C L and C R makes with the "horizontal" plane T . 
(T contains the optical axes). Clearly, for P L and 
Pr to be produced by a point P in 3-dim space, the y 
of P^ must be equal to the y of P R . This is a necessary 
condition that is easy to check. 

A real point P of the scene produces a left image P (which has 

±j 

a certain value of y) and a right image P with the same value of y 
(figure ' Y-PARAMETRIZATION ' ) . 

Thus , given a point in one scene , we 
have to search for its genuine pairs 
in the other scene among the points 
with its same y. They will be found 
along an straight line through A or B. 

Parametrization of the scene is possible not only by using y» 
a monotonic function of y will do. 

For computational efficiency, it may be advisable to store the 
points of the scenes into arrays according to the value of their y's. 
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The function LINE maps points of 1 into lines of R. 
An image point P may have come from different 3-dim points P, P' , P' . . . 
all of them situated in the line of sight of P . The right images 
of P, P', P", ... all fall in a straight line which is the intersection 
of the shaded plane [called plane V C l" P " C r" P R ln £ig - ,Genulne Pairs ] 
and the right retina. 
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When the optical axes are parallel -.,... , 

In this case, points A and B on 

line C L -C K (fig. 'Genuine Pairs') travel to infinity, and lines P -A 

and P R -B become horizontal (parallel to C-C_). The situation looks 

like 



L 




to. 

to. 

<i-o. 

-<0. 


ft 


ft 


•10' 


P« 

















A genuine pair (P , P ) will 
have the same y-coordinate for 
both of its elements (10.0 in 
this case) . 

So that, given a left image point P , we have to search only 

among the points of R with its same height , to find- "the" P that 

will make a genuine pair (P , P ) . 

L £ 

But several genuine pairs may be found. Because on each hori- 
zontal line on R, many points may lie. 



USE OF SEE IN STEREO PERCEPTION 

We can use the invariance of the variable described in Theorem 
S-2 to locate objects in three dimensional space, from a pair of ste- 
reo views (we will suppose parallel axes; other case is similarly 
treated) as follows: 

(1) Make an analysis of the left scene with SEE, identifying the 
bodies. 

(2) Id. for right scene. 

(3) Reduce each body to an interval formed by two numbers, its 
maximum and minimum height, specifying "closed" if the absolute 
extremal of the body is known, "open" if not. 

In this way we reduce each scene to a set of intervals (see 
figure 'INTERVALS'). 
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Each body Is reduced 
to an interval. 

(4) Use these Intervals to select which left body will go with what 
right body. The answer is simple (because it is unique) even 
in moderately crowded scenes. 

It is simple to take into account the fact that an open 
end of an interval indicates that the interval can extend 
further at such end. 

Sources of difficulties are: 

(a) Two bodies have the same interval, meaning they have identical 
maximum heights and minimum heights. This is possible. 




Quite easy: reduce some faces to intervals and compare them. 

(b) A body is seen in left scene but not in right scene (figures 
L12, R12). 

(c) SEE partitions one body in two in one scene, but not in the 
other. 

The "open" and "close" indications will help here. 
Also, remember that we are using, when comparing these intervals, 
just a very small part of the total information concerning each body. 
When the selection is narrowed down to two or three candidates 
["left-body 1 is either right-body 2 or right-body 5 "] , one can use 
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(1) the WHERE function of Griffith (op cit) , 

(2) as in (a) above, the intervals for each face of the 
objects, so as to chose as "genuine pair" those two 
objects with more agreement in the intervals of their 
faces ; 

(3) perhaps a face of unusual shape is enough for discri- 
mination, if it appears both in left and right scenes, 
or the number of vertices below the center of gravity, 



summary 

In summary, I should like to point out that, while much 
has been stated within the somewhat constricting frame- 
work of this article, much remains to be stated. Certain, but 
not all, important classes of presentations have been 
treated, and there remain horizons as yet unexplored. Con- 
ceivably, the author will attempt, ex nihib nihil fit, to estab- 
lish a more general perspective in the course of a subse- 
quent article. ( RH . j,^ utu*ito>« *" *»)• B 

Also, the reader is referred to other 
articles on the same topic. 
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Scene LlO - RIO gEE analyzes independently (pages !« and IW) the left 

and right scenes, obtaining the following bodies: 

(BODY i, IS 15 U 14 U2) ijjji sa5HE ( L io) 

(BODY 2. IS *6 MS *7 *H.U4> 

(BODY 3. IS *8 19 tjo *3) 

(BOCY 4. is *2 H3) 

(BODY 1, IS X*3 X*5 X*6 X*14) 

RIGHT SCKHK (RIO) 'BODY 2. IS Xtl3 Xtl Xlll %I9 Xll5) 

(BODY 3. IS X»8 X*2 X*10) 

(BODY 4. IS X»4 XI7 XU2> 

For each of the eight bodies, we compute its minimum height and its 
maximum height, obtaining the following intervals! 

LlO RIO 

«5 ;i : 4 :12 —166,105) [67>154] ^ XJ3 Xi5 %s6 %JU 

:6 :j5 :7 HI Si4-[79,120] [78>119] x »13 Xtl Xlll Xt9 XM5 

16 ■ » :i0 *3 ^[68,1521 [65>103) _ x „ %s2 %si0 

* 2 * 13 —121.82) [22,82) «_ X *4X»7X*12 

These intervals are compared (left with right) , trying to find 
pairs with discrepancies between their values tolerably small [if the 
interval has an open end, differences can be larger]. For 'LlO - RIO', 
these are 

[66,105) - [65,103) 
[79,120] - [78,119] 
[68,152] - [67,154] 
[21,82) - [22.82) 
that corresponds to the following identification of bodies: 

15 SI !4 $12 corresponds to %:8 X*2 XI 10 
16 :i5 :7 in «i4 corresponds to X*13 Xtl Xlll X«9 XU5 
:6 19 110 $3 corresponds to Xl3 XI5 Xi6 XI 14 
12 113 corresponds to %$4 Xl7 XI 12 

Once these correspondences between objects in the two images Art 
found, the function (WHERE ...) {Griffith} will position these bodies 
in three-dimensional space, achieving our goal. 
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CONCLUSIONS 



LOOKING BEHIND 



When I started to work on these problems , the idea was to 
describe an object by using a model, and with this model in memory, 
to search the scene looking for sub-parts of it that would fit the 
description. 

This work ended (as far as this thesis is concerned) with a 
program that finds bodies without having a model of them. 

But that is good. 

We did not know at the beginning that this could be done. 



LOOKING AHEAD 

a. Suggestions for further work 

b. Comments 

c . Re c ommenda t ions 

d. Summary 

e. Conclusions 

f. Evaluation 

g. Extensions and Implications 



All these matters are 
normally encountered 
grouped in a chapter 
at the end of the work 



I can only partially lump all these important matters in one 
final section; many times I cite them in context, that is, next to 
the figure or subject that evokes them, or with which they are most 
closely related. As a result, they are spread through the body of 
this dissertation. 

Also, 
(1) The box | SUGGESTION] appears through this thesis near a 
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partially unsolved or partially formulated problem, and/or its 
partially outlined or partially new solution. 

(2) In page T5<o there is a list of such suggestion boxes. 

(3) The remaining portion of this section and, in general, the 
sections close to the end of this work, abound in statements 
of type v a.) through (g.). 

(4) 1 have tried to start each section with a brief , and end it with 
a summary or conclusion . 

(5) The section 'Introduction' (page 10 ) specifies the problems 
treated in this thesis, and the section 'Preliminary view of 
Scene Analysis' (page |f ) produces a general view of available 
methods. 
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SUGGESTION 




General notation _ , ., , 

— — — — — — — To put, remove, etc., links, we 

may develop a notation that will look like 

(WHEN A (Y A) (B il C |3 D »2) 

D (K ( A F ..)) (A t3 E !4 F :2) 

THEN 

FDT LINK KIND 3 »3 ;4 

NO LINK :1 :2 ) 

"When A is a vertex of type 'Y' , and 
D is a vertex of type 'K 1 , and 
A and D are joined as specified, 
then 

put a link of kind 3 between region :3 and :4, and 
do not put a link between :2 and ;1," 

The general notation is 

(WHEN P E E 1 ) 
"when predicate P is satisfied, evaluate expression E (execute 
E), otherwise execute E* (which may be missing)". 

In this notation, the predicate P corresponds to a geometric 
pattern or configuration, and the expressions E and E 1 to the esta- 
blishment or removal of links. 

In SEE, this part is handled by LISP functions (hand-coded), 
one for each particular heuristic. The suggestion is to develop this 
general notation, and an interpreter for it. This will speed up 
programming and checking, but will slow down the execution to 
some extent. 

Use 

— The main use of the new notation or language is for trying 

new heuristics. Actually, it is not difficult to hand-code the 
new heuristic in LISP (see function EVERTICES in listings), because 
everything reduces to calls to NOSABO, THROUGHTES, GEV, SUME, etc. 
I was thinking that a simple MACRO of Lisp could transform from no- 
tation (WHEN PEE') to LISP functional calls. 

Since what the notation or language is really doing is expressing 
as a linear string a two-dimensional configuration JS , a more am- 
bitious project would be to use the light pen and draw this configuration, 
and then have our interpreter or compiler produce the LISP program. 
This may look a little like AMBIT-G fChristensen}. 
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Assigning a name to an object 

Problem . SEE has separated a scene Into bodies. What are they? 
Is there a pyramid among them? Where are the parallelepipeds? 

To answer this, Information can be supplied to the program, In 
the form of a symbolic description or model of the object we are 
trying to find. A model is an idealized account of a class of objects, 
all receiving the same name, like "triangular pyramid" or "house". 
Models may have parameters that acquire values after a given instance 
of the model has been found in a scene. Examples are "height" or 
"length of bottom side". 

Some programs that follow the above procedure to name objects 
in a scene are described and discussed in a Master's Thesis {Guzman}. 
There are difficult problems to be solved if we are to make the 
system able to recognize occluded objects In many situations. 

One could, of course, bypass SEE and look for particular objects, 
as it Is done by Polybrick {Hawaii 69}, a program that finds paralle- 
lepipeds. 
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Do not use over- specialized a3Sumptlon a . U se more information 

— — — ^— _______ _____ _ _____ _ _____ __ In 

trying to solve a problem, people will apply quite different methods. 
They may also suppose quite different assumptions, some of which 
may not hold. Due to particular experience, environment, preferen- 
ces, etc., some subjects may be using over-specialized assumptions, 
instead of requesting more data, more information to solve the 
problem. We may bias our views and risk arriving at conclusions 
(of the "common sense" type) which are valid only on restricted 
segments of populations, or in particular conditions or situations. 
Holes. For instance, if most of the readers of this thesis [technical 
specialists, who have learned to read, are interested in graphical 
processing and computers, etc; who may not be considered a repre- 
sentative cross-section of Homo Sapiens] perceive "objects" a, b 
and c of' figure 'HOLES' as holes {Winston}, we may be tempted to 
conclude that this is a general property, and rush to write a 




Fig. 'HOLES' 

The idea* that objects a, b, c 
have to be interpreted by all 
men, and hence by a program, as 
holes in the larger box, is 
dangerous, {cf. AI Memo 163} 

subroutine to find such orifices. Perhaps other sectors of our 
population would simply say, with respect to a, b, c, of figure 
'HOLES' that "there is not enough information to make a decision" 
(see also section 'On optical illusions'). Or they may come with 
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different answers, using their set of assumptions which may be 
different from ours, since their experience is different too. 
The Ames* Room (see Box, page %oi) and Gregory (see Box) warn us 
of this. 



Other example of over-specialization 

' For people familiar with 

Descriptive Geometry, it is easy to see that figure 'DESCRIPTIVE' (I) 

shows a straight line in the first octant. For them, indeed, it 

is easy to visualize this line in three dimensions and have a fairly 

good idea of its position and orientation in space, just from 

figure (I). 

Other persons would need a more conventional fLgure, such as 
figure 'DESCRIPTIVE' (II), to visualize the same line, to get the 
same idea. 

What happened was that the first group of persons were using 
e specialized knowledge, their mind were trained, figure (I ) was 
familiar to them, etc. 





Figure * DESCRIPTIVE ' 



(ID 



Conclusion 



Before looking for heuristics and shortcuts, before making 



assumptions, deductions, etc., let us be sure that there is enough 
data to solve our problem. 
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Human perception versus computer perception „, .... 

' i Given a two-dimensional 

line-drawing of a three-dimensional scene, the problem of finding 

bodies in it is inherently ambiguous 1 many 3-dim scenes can generate 

the same 2-dim scene. 

Multiple solutions are possible. More over, the me ta theorem 
of page yj guarantees that a solution always exists, and provides 
ways to construct it. We call this solution "trivial"} in effect it 
is trivial to write a computer program that will invariably find it. 

From the multitude of possible solutions, human beings select 
one, which is * different from the trivial, and call it "normal" 
or "common" or "standard" or "reasonable" interpretation of the 
scene . 

Our program SEE also selects one of the many solutions. 
How does its selection compare with the human choice? 

== When the scene is "clear", in the sense of evoking human 

unanimity, SEE will * also select that same answer. Example: 
Figure 'TOWER 1 . 

■" As the scene or drawing gets complicated or ambiguous, mortal 

behavior deteriorates; opinions split, optical illusions may tmenje 

(indicating contradictory evidence perceived), several 
plausible answers are emitted. 

The answer of SEE in these cases will * be found among the 
humanly plausible selections. In some cases, it may not agree 
with the majority. 

= Finally, people make mistakes. They will see an object that is 
not there, or will fail to see an object, or classify it as 
"impossible". 

But SEE also errs. It sometimes succeeds where people fail, 
more often it is the other way around. 



* 

In an overwhelming majority of cases. 
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TABLE "ASSUMPTIONS" 



ASSUMPTIONS MADE BY THE PROGRAM 



These assumptions have to be obeyed for SEE to give good results: 

■■ The objects are three-dimensional solids formed by planes 
No needles or cardboards allowed. 

«- They produce a two-dimensional image or projection where all 

(2) 
lines are straight N '. 

— Paces have no drawings, marks, labels, etc., imprinted on. 
«■ Objects do not have holes in them. 



1 See section 'On optical illusions' for conditions for partial 
lifting of this assumption. 

2 See section 'On curved objects' for conditions for partial lifting 
of this assumption. 



ASSUMPTIONS NOT MADE BY THE PROGRAM 

These assumptions are not necessary for the correct functioning of SEE; 
it will work well with or without them. 

— Only prisms are allowed. 

■» The scene is a parallel projection, or isometric drawing. 

"=■ The objects are convex. 

— The model or description of the object has to be known to SEE. 

— The objects have to appear unoccluded or unobstructed in the view. 

— The objects have "weight" in the vertical direction and will 
fall if not supported. 

— The background is known in advance (See 'On background discrimi- 
nation by computer'). 

I repeat, these assumptions are NOT obeyed by our program. 
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146 

173 
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200 
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ANMOTATED LISTING OF THE FUNCTIONS USES 

You do not have to know these things in order to use SEE (res- 
ding 'How to use the program' in page 7f is enough) or to understand 
what it does (it is explained in 'SEE, a program that finds bodies in 
a scene', page f f ); these things are put here Merely for completeness 
and to make easier the understanding of the inner workings of SEE. 

A list ing is a formal description _ 

There is a stronger reason, 

however. A listing of the programs is a formal description, an 

algorithm, an exact statement in a formal language of what we may 

have been describing, perhaps inaccurately, in a natural language 

(English). It becomes the starting point of serious discussions. 

The reader who is skeptical at some point, or did not understand 

some English statement, can always clarify his doubts in the listing. 

To be understandable, the listing has to have annotations, comments. 

A mathematician is hot forced to explain his work always in na- 
tural language, but rather he is allowed to employ abstract notations, 
symbolisms, formalizations of his thoughts (indeed, it is preferable 
this way) . A programmer should not hide his listings (he should not 
be forced to re •state his algorithms in natural language exclusively 
{ 68}) and force his readers to use the ambiguous channels 
of his natural language communication. 

And this brings another point. Hot only a programmer should not 
hide the listing (unless there are^bugs or incomplete subroutines), 
but. he should not, hide .the programs (unless they are banal); by this 
I mean honest and reasonable efforts should be made to facilitate fa 
ture potential users the access to these programs . Include t 

™ Documentation 

■" Listings, tape or card deck names, etc. 

— Test data 

— Printout of an interaction with such test data, 
including loading, compilation, execution, results. 

■■ Time spent (by machine and by man) . 

See also R. Rain's letter {C. ACM March 67}. 
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